UC3 : Brand protection, fight against typo-squatting

De Wiki Campus Cyber
Aller à :navigation, rechercher

Développement du Use Case pédagogique "UC 3 : Lutte contre le typosquatting" dans le cadre du GT IA et Cyber

Catégorie : Commun Statut : Production 1 : Idée - 2 : Prototype - 3 : Validation - 4 : Production


Overview[modifier | modifier le wikicode]

Business Objective[modifier | modifier le wikicode]

Typo-squatting is a form of cybersquatting based mainly on typing and spelling errors made by the Internet user when entering a web address in a browser.

Concretely, the typosquatter buys a certain domain names, whose spelling or phonetics is close to that of a very frequented site or a well-known brand, so that the user making a spelling error or an unintentional typo is directed to the site owned by the hacker.

This practice can produced multiple risks for the users (malware, ransomware) and/or for the well-known brand (lost of clients, bad images).

This project proposed a solution to detect suspicious typosquatting websites.

Project Objective[modifier | modifier le wikicode]

In order to detect typosquatting websites, this set of notebooks compute, for a website, a score of similarity relative to the reference website (high trafic website).

This score is a value between 0 and 1. 1 means exact similarity between the two websites. The more closed to 0 the similarity is for a website, the more suspicious the website is.

Results[modifier | modifier le wikicode]

The algorithm create a blacklist : a list of suspicous URL, considered as typosquatting websites.

  • Authors : Nicolas Stucki & Ugo Biancone & Mathieu Lacroix
  • Keywords: Permutations, Lines of Threat, WebScraping, WordEmbedding, Jaccard Similarity

Data[modifier | modifier le wikicode]

A csv file with only the target website URL. Ex : lvmh.com, facebook.com, ...

Notebooks[modifier | modifier le wikicode]

Notebook Data Science step Typo_p1_DNSTwist.ipynb Generate child url addresses from permutation of the mother address Typo_p2_Webscraping.ipynb Retrieves the content of the child and parent web page Typo_p3_webscrapingjaccard Compute a measure of similarity between the child and the parent page (Jaccard)

Notebook Data Science step     
Typo_p1_DNSTwist.ipynb Generate child url addresses from permutation of the mother address   
Typo_p2_Webscraping.ipynb Retrieves the content of the child and parent web page   
Typo_p3_webscrapingjaccard Compute a measure of similarity between the child and the parent page (Jaccard)

Requirements[modifier | modifier le wikicode]

  • Python (3.6 or +)
  • Pandas (1.1 or +)
  • Numpy (1.19 or +)
  • DNSTwist (20220131)
  • Requests (2.27)
  • BeautifulSoup (4 or +)
  • Gensim (4.2.0)
  • Scipy (1.6.2)

Notebooks du use case[modifier | modifier le wikicode]

Retrouvez tous les éléments du Use Case sur le GitLab du Campus Cyber : https://gitlab.com/campuscyber/gt-ia-et-cyber/-/tree/main/UC3%20Brand%20protection,%20fight%20against%20typo-squatting?ref_type=heads

Groupe de travail

IA et cybersécurité