Logging… Some Good Practices.

Floormind
8 min readNov 20, 2019

I was working on a compliance project to help develop a way of monitoring events across a web application. The requirement was to have visibility of the events such as logins, logouts, change of user details (user role change, password change, when users have been made inactive, when users have been archived).

Though I had to ensure logs are generated when required across the application, I also created a proof of concept for using centralised logging and I was inspired to write this post.

What this post is

  • The objective of this post is to best describe how we should approach logging in a theoretical manner.
  • This should be looked at as a building block on how developers should take advantage of logging within applications.

Why Log

Logging gives a transparent view of exactly what is happening within our applications at a specific point in time. Yes developers will be able to see the root cause of a problem quicker, but without the use of logs, we may miss extremely important events that are not just valuable to the developers working on the product but also non technical individuals such as the Business Analysts or Product Owners, Delivery Managers and other important people within the organisation which can impact the business’s decision making as a whole such as help improve the product by prioritising improvements and fixes based on how often an issue occurs and the log severity.

As mentioned in the first paragraph, this may also ensure that we meet the need of compliance. The next couple paragraphs will best describe some of the main points to take into consideration when implementing logging.

Make Use Of Reliable Third Party Libraries

As Software Engineers, it may be extremely tempting to want to reinvent the wheel and create something cool from scratch that will achieve a required functionality, the truth is it is best not to do that. Chances are there are things that may be left out during the implementation, or creating inefficient code that can do more damage than good such as slowing down system or creating more bugs.

Not having to reinvent the wheel also means that we do not spend time creating something that is already out there, tested by other engineers, working, extremely stable and always being updated.

Standardise the logs (Use Log Severity Levels)

This is targeted toward the Developers/Software Engineers working on the applications. As logging becomes consistent across the applications, it is also extremely important to keep a few things relating to the data consistent. To elaborate, the team working on the application should agree on what log level is applied to each data that is being logged. Various third party logging libraries have specific methods relating to log levels (in log4net, the levels are Debug, Info, Warn, Error, Fatal), the idea is that we understand what log method should apply to the data being logged. For example “user not found” message, is that an info level log or a warning level log ?, some systems may throw exceptions based on that and require that message to be logged as an error. Whatever the use case is, the main point is to agree on how the team should set the log levels.

The advantage to this is that when searching through a log file or through applications such as ELK with the log level as a filter, the returned data will be close enough to having an accurate result or count of what was searched. The business or the engineers can then do whatever needs to be done with this result.

Format Data

Having structure in the logs that allows the data to be human readable. There are some tools available that can help parse log data into a more readable format such as Logstash, it is best to make use of these tools. Having a consistent format is important because it creates a familiarity of what the log data will look like, thus reducing the “large intimidating log data”. If there is a consistent format applied to the logs, looking at each log line will be a lot easier.

Other than taking advantage of the log parsers, logging libraries such as Serilog can be configured to log data ins a JSON format and being that JSON is a Key:Value based notation, some people may find this a lot easier to read.

Having an object that represents what a log data should consist of may be helpful. In Java, maybe create a Pojo class, in C# maybe create a POCO class. These classes can contain properties which hold information such as EventID, Log Level, Class, Method, Log Message etc. This POJO/POCO class then be injected along with the logger, and in the case of an exception or a simple message log, the class can be used to log the data. This allows a consistent log format and the developers can then know exactly where to find an information they require as they are familiar with that object.

Provide Context

It is important to log as much information as possible, it is more important that the right context is provided with the data being logged. Providing log context can help in many ways, such as specifying exactly what caused an error to occur in the application or who made what change at a given time and this helps the data being logged make sense.

Context is based on what the scenario is, when logging data, some questions should come to mind,

  • How can value be added to this information ?
  • Is the message helpful to the reader ?
  • In the case of exception handling, is there adequate amount of information provided to help narrow down the issue and get to the root cause ?
  • What information should be included in this log message.

Examples of “not so good” types of log messages are

  • User cannot be found (when trying to login)
  • Unable to delete record
  • User role has been changed

Context can be provided to these log messages by stating who, what, where. Who being the current user of the application or the user being affected by a change, what being what the user was trying to do at the time of the exception, and where being the location at which the error was raised.

To make the above messages make sense,

  • User <user email> not found
  • Unsuccessful attempt of deleting the record <record ID> by user: <user ID> has been unsuccessful <exception message>
  • Admin user <user ID> has changed the role of user: <user ID> from role: <role ID> to <new role ID>

These messages can also include where the exception occurred such as the class name and method name of the executing code. Log libraries will most likely provide the class where the log was created as the most common pattern is to inject a generic interface of the logger with the class as the generic type.

The above information now contains some meaningful information that can help the software engineers get to the root cause of the issue, also business users such as the stake holders can see the history of the activities on the application.

No Sensitive Data

Though it is best to log events providing as much context as possible in applications, it is also extremely important to be very cautious of the actual information being logged along with the data. This is to ensure that information that are classified as sensitive are not logged.

It is very important to understand why care should be taken around any sensitive data as this ties into compliance regulation. In order to get this right, a clear understanding of the definition of sensitive data is needed. This may be subjective based on the location of the operating business, for instance GDPR laws applies to countries in the EU, this also means that data that are collected and logged are not breaking compliance regulations and GDPR laws. Examples of sensitive data are

  • Name
  • Home address
  • Email address
  • Identification or card number
  • Banking & Payment information
  • Passwords
  • Intellectual Property
  • Personal information about the business or owner of the application.
  • Any other personal information about the user of the application.
  • Health, Sexual orientation, Religious beliefs, Political beliefs and Race or ethnic origin

The data mentioned above may not be applicable to all application and also some data identified as sensitive may be subjective, it is important to always check with the those in charge or a compliance team to ensure that it is OK to have this information in the data logged.

Though rights may be given to the application to store personal data, this data also needs to be protected against potential threat. For instance if a password or bank details is logged to file, there is the risk of it being accessible to a malicious user once this use gains access to the machine where the log files are stored or through the web by the means of a gaining access to any applications that consumes the log files for web presentation.

Data Masking

In some cases, not providing some of these sensitive data may be unavoidable as some of these information may have to be included in the log data to provide a context, in this case data can be masked in a way where only a part of the information is logged. For instance, only log the last 4 digit of a credit card.

Encryption

Logs can be also be encrypted for security and TLS (Tansport Layer Security) which is a transport protocol between servers and web application can be used to encrypt log data and this will protect the data against some of the potential threats web applications may face by

  • Ensuring parties who are consuming the logs are authenticated.
  • The data has not been tampered with and it’s integrity is good.
  • As mentioned above, encrypting the data.

Log To File

Logging to file is a preferred approach for many reasons such as keeping an historical view of the events. This provides numerous amount of benefits to the Developers, Testers, Product Owners have a clear insight of the technical events within the applications, it also allows the business as a whole gather important information such as statistics.

Though log files are readable, the files can easily be looked at for issues relating to debugging or just a general search of events. Logging to files also allows us to take advantage of centralised logging which helps in searching and procession large amount of log data, or maybe incorporate any data science technology.

Logging to files allows us to maintain historic events and data, this can also be an issue because, log files can grow to a point where the files are too large, therefore an approach to cleaning the log data should be considered to ensure files do to take most of the storage on the server and affect other applications running on the server. Perhaps an approach could be archiving/deleting log files that are older than a given date which are considered irrelevant).

Centralised Logging

ELK was mentioned a few times, which is ElasticSearch, Logstash, Kibana. This is a centralised logging tool, there are also other tools that can be used such as Splunk, Graylog, Datadog and many more.

Centralised logging applications come with its many benefits such as;

  • Actively monitoring changes to your log files in real time.
  • Parsing log file data.
  • Processing large amount of data.
  • Good user interfaces with a search functionality.
  • Making use of graphs which helps data make sense.
  • Applications using load balancing will also be aided by centralised logging frameworks.
  • Authentication, only permitted users will be able to view the logs
  • Alerts based on set threshold (Slack, emails, telephone etc).
  • Other than files, centralised logging can help in monitoring Syslogs, Metrics, Networks data, Windows Events etc.
  • Other software patterns such as microservices will benefit from this because these applications can create their own individual log files and have it monitored.
  • Grouping log data into sections.
  • Data export (to file, email etc)
  • Extensions for AI and Data Science

Round Up!

Though I have tried to cover most of the important aspects relating to logging, I wish to know how other people in the field are making use logging within their applications. Also anything that hasn’t been mentioned in this post is highly welcome.

Thank you and happy coding!

--

--