The Royal Statistical Society’s Merseyside Local Group are pleased to announce our next event "Statistics in Cybersecurity".
Each of us relies on online systems to handle our personal data every day, from email and web browsers to the not-so-obvious like televisions and smart devices. Security is a constant concern and the field of cybersecurity is a rapidly responsive area bridging a unique blend of mathematics, statistics, cryptography, and computer science. These talks will explore some fascinating cases of how statistics is being used to keep digital networks safe from threats and exploitation.
This event will be followed by the Annual General Meeting of the Merseyside Local Group.
14:00 - 14:10 Welcome/Chair's introduction
14:10 - 14:55 Dr Antony Lawson (Darktrace PLC) -
Analysing email structure to detect malicious intent
14:55 - 15:10 Refreshment break
15:10 - 15:55 Prof. Nick Heard (Imperial College London) -
Identifying hacker groups in honeypots and other statistical challenges in cyber-security
15:55 - 16:00 Close (followed by AGM)
Dr Antony Lawson (Darktrace PLC) –
"Analysing email structure to detect malicious intent"
Malicious emails usually try to induce particular behaviour/actions from the recipient. For example, extortion emails typically threaten to release embarrassing videos of the recipient unless a crypto-currency payment is made. Phishing emails usually seek to harvest credentials either through fake login pages or malicious payloads.
Looking for such emails by specific content, address or domain is a poor strategy as these features change over time. Further, such emails may try to capitalise on contemporary topics, such as the Covid-19 pandemic or the current cost of living crisis.
Therefore, we instead target the underlying structure and non-specific content of such emails.This is achieved by analysing factors such number of sentences, average sentence length, number of characters to first link, cryptocurrency references, HTML tag density or use of non-standard punctuation. Some NLP (natural language processing) is used to identify key, non-specific words/phrases to further supplement the analysis.
The resulting classifier returns scores in four categories of inducement : extortion, phishing, solicitation and other spam. This approach deals with new content/email addresses better than many other approaches and is demonstrably effective. For example, the classifier was initially trained on a dataset that predated the Covid-19 pandemic. However, it was still able to effectively identify phishing emails related to the topic, including one which encouraged employees to log in and donate to their company’s covid relief fund!
Additionally, a profile of typical behaviour can be developed by tracking a sender’s inducement scores over time. Inducement scores of new emails can then be compared to the profile to obtain an “inducement shift” score. A high inducement shift score could be indicative of account hijacking, and can supplement the inducement scores. This allows us the autonomous-response platform to take a more robust response.
Prof. Nick Heard (Imperial College London) –
“Identifying hacker groups in honeypots and other statistical challenges in cyber-security”
This talk will present a summary of some existing applications of statistical methods in cyber-security, mainly concerned with modelling the normal but heterogeneous day-to-day behaviours observed in enterprise computer networks and performing anomaly detection. Focus will then switch to a more recent investigation detecting cluster structure of interactive sessions through topic-style modelling of commands issued by network intruders; an aim of these analyses is to uncover emerging new cyber threats or intents.