UC7 : Suspicious security events detection

From Wiki Campus Cyber

Development of the educational Use Case ‘Suspicious security events detection’ as part of the IA and Cyber WG

Category: Common Status: Production 1 : Idea - 2 : Prototype - 3 : Validation - 4 : Production


Overview[edit | edit source]

Project Objectives[edit | edit source]

The overall objective of the Cyber project is to propose pedagogical notebooks presenting a set of algorithms on a precise cyber application.

Business Objectives[edit | edit source]

The use case (UC7) aims at detecting abnormal security events in web events.

Data & methodologies[edit | edit source]

By exploring web activities, it uses two stastistical methods (interquartile range - IQR - and Isolation Forest) to identify "outlier" behaviors that are rare compared to the most standard behaviors.

Results[edit | edit source]

This section is not really applicable since the current objective is not targeted towards production or POC. Nevertheless, the two models are highlighting the most abnormal IP addresses (and potentially related users). These lists of addresses could be used as input for further investigation by an operational expert.

  • Authors : Nicolas Stucki & Thomas Levy
  • Keywords: Unsupervised detection, clustering, outlier detection, Isolation Forest, interquartile range (IQR)

Data[edit | edit source]

Online Shopping Store[edit | edit source]

This use case relies on the dataset ["Online Shopping Store - Web Server Logs"](https://doi.org/10.7910/DVN/3QBYB5).

The dataset has been processed to convert it from a raw log file format into tabular elements corresponding to client requests to the web site. Then, to reduce its size, it has been sampled to keep only requests coming from a sub-part of the client IP adresses. Finally, 3 csv files have been ingested into the platform:

  • Small size dataset: logs_sub_2.csv (211 335 lines)
  • Medium size dataset: logs_sub_5.csv (495 997 lines)
  • Large size dataset: logs_sub_10.csv (1 030 453 lines)

Main columns used are:

Name Description
SourceIP Source IP address   
datetime Local date&time of the reception of the request by the web server   
method HTTP method of the request (i.e GET, POST..)

Specific feature engineering[edit | edit source]

Not applicable

Notebooks[edit | edit source]

Notebook Data Science step     
UseCase7_DataPrep.ipynb Data preparation (executed outside the platform)
UseCase7_Detection.ipynb   Security events detection (including simple feature engineering)

Risks & Compliance[edit | edit source]

Type Applicable Comment(s) (if applicable)     
Bias No Has a complete bias study been carried out ? Comments if applicable   
Ethics committee No Does the project needs to be screened by an ethics committee and if so has this been achieved ? Comments if applicable
RSSI No Does the project needs to inform the RSSI about its outcome and activity, and if so has this been achieved ? Comments if applicable   
DPO No Does the project needs to inform the DPO about its outcome and activity, and if so has this been achieved ? Comments if applicable   
RGPD No Are all the data collected necessary to perform the project objective and is there a necessity to collect consent from individuals ? Comments if applicable   
CNIL No Does the project needs to inform the CNIL ? Comments if applicable

Requirements[edit | edit source]

  • Python (3.6 or +)
  • scikit-learn (1.0.2 or +)
  • seaborn (0.11.2 or +)

Notebooks[edit | edit source]

Retrouvez tous les éléments du Use Case sur le GitLab du Campus Cyber : https://gitlab.com/campuscyber/gt-ia-et-cyber/-/tree/main/UC7%20Suspicious%20security%20events%20detection?ref_type=heads

Working Group

IA et cybersécurité