This document highlights the risks that can be mitigated by regularly reviewing logs and makes concrete recommendations on how to do log review.
Taking into account the state of the art, the costs of implementation and the nature, scope, context and purposes of processing as well as the risk of varying likelihood and severity for the rights and freedoms of natural persons, the controller and the processor shall implement appropriate technical and organisational measures to ensure a level of security appropriate to the risk, including inter alia as appropriate: [...] a process for regularly testing, assessing and evaluating the effectiveness of technical and organisational measures for ensuring the security of the processing.
2 § Vårdgivaren ska genom ledningssystemet säkerställa att [...] 4. åtgärder kan härledas till en användare (spårbarhet) i informationssystem som är helt eller delvis automatiserade.
Mapping to ISO 27001 Controls¶
- A.12.4.1 "Event Logging"
- A.12.4.3 "Administrator and Operator Logs"
Compliant Kubernetes captures application logs and audit logs in a tamper-proof logging environment, which we call the service cluster. By "tamper-proof", we mean that even a complete compromise of production infrastructure does not allow an attacker to erase or change existing log entries, as would be required to hide their activity and avoid suspecion.
Attackers can, however, inject new "weird" logs entries. However, that wouldn't remove their tracks and would only trigger more suspecion.
However, said logs only help with information security if they are regularly reviewed for suspicious activity. Prefer to use logs for catching "unknown unknowns". For known bad failures -- e.g., a fluentd Pod restarting -- prefer alerts.
Periodically reviewing logs can mitigate the following information security risks:
- Information disclosure: Regularly reviewing logs can reveal an attack attempt or an ongoing attack.
- Downtime: Regularly reviewing logs can reveal misbehaving components (e.g., Pod restarts, various errors) and inform fixes before it leads to downtime.
- Silent corruption: Regularly reviewing logs can reveal data corruption.
How to do log review¶
By review period, we mean the time elapsed since the last review of the logs, e.g., 30 days.
Aim for a review which is both wide and deep. By wide we mean that you should vary the time interval, time point, filters, etc., when reviewing log entries. By deep we mean that you should actually read and try to understand a sample of logs.
- Open up a browser and open the Compliant Kubernetes logs of the cluster you are reviewing. This functionality is currently offered by Kibana and Elasticsearch.
- Search for the following keywords on all indices -- i.e., search over each index pattern -- over the last review period:
unknown. Sample a few keywords you recently encountered during your work, e.g.,
not found; be creative and unpredictable.
- Vary the time point, the time interval, filters, etc.
- Go wide: For each query (index pattern, keyword, timepoint, time interval and filter combination), look at the timeline and see if there is an unexpected increase or decrease in the count of log lines. If you find any, focus your attention on those.
- Go deep: For each query, sample at least 10 log entries, read them and make sure you understand what they mean. Think about the following:
- What are potential causes?
- What are potential implications?
- Time: Do the entries appear periodically or randomly?
- Space: Does a specific component trigger them? Is the entry generated by the platform or the application?
- If anything catches your attention vary the time point, time interval and various filters to understand if the log entry is a risk indicator or not. Look for unknown unknowns. Any failures, especially authentication failures, which feature a significant increase are risk indicators.
- Contact the person owning the component, e.g., the application developer or Compliant Kubernetes architect, to better understand if the entry is suspecious or not. Perhaps it is due to a recent change -- as indicated by an operator log -- and indicates no risk.
- If you found a suspecious activity, escalate.
- If the log entry is due to a bug in Compliant Kubernetes, file an issue.