UC3 : Brand protection, fight against typo-squatting

Développement du Use Case pédagogique "UC 3 : Lutte contre le typosquatting" dans le cadre du GT IA et Cyber

Catégorie : Commun Statut : Production 1 : Idée - 2 : Prototype - 3 : Validation - 4 : Production

Overview[modifier | modifier le wikicode]

Business Objective[modifier | modifier le wikicode]

Typo-squatting is a form of cybersquatting based mainly on typing and spelling errors made by the Internet user when entering a web address in a browser.

Concretely, the typosquatter buys a certain domain names, whose spelling or phonetics is close to that of a very frequented site or a well-known brand, so that the user making a spelling error or an unintentional typo is directed to the site owned by the hacker.

This practice can produced multiple risks for the users (malware, ransomware) and/or for the well-known brand (lost of clients, bad images).

This project proposed a solution to detect suspicious typosquatting websites.

Project Objective[modifier | modifier le wikicode]

In order to detect typosquatting websites, this set of notebooks compute, for a website, a score of similarity relative to the reference website (high trafic website).

This score is a value between 0 and 1. 1 means exact similarity between the two websites. The more closed to 0 the similarity is for a website, the more suspicious the website is.

Results[modifier | modifier le wikicode]

The algorithm create a blacklist : a list of suspicous URL, considered as typosquatting websites.

Authors : Nicolas Stucki & Ugo Biancone & Mathieu Lacroix
Keywords: Permutations, Lines of Threat, WebScraping, WordEmbedding, Jaccard Similarity

Data[modifier | modifier le wikicode]

A csv file with only the target website URL. Ex : lvmh.com, facebook.com, ...

Notebooks[modifier | modifier le wikicode]

Notebook Data Science step Typo_p1_DNSTwist.ipynb Generate child url addresses from permutation of the mother address Typo_p2_Webscraping.ipynb Retrieves the content of the child and parent web page Typo_p3_webscrapingjaccard Compute a measure of similarity between the child and the parent page (Jaccard)


Notebook	Data Science step
Typo_p1_DNSTwist.ipynb	Generate child url addresses from permutation of the mother address
Typo_p2_Webscraping.ipynb	Retrieves the content of the child and parent web page
Typo_p3_webscrapingjaccard	Compute a measure of similarity between the child and the parent page (Jaccard)

Requirements[modifier | modifier le wikicode]

Python (3.6 or +)
Pandas (1.1 or +)
Numpy (1.19 or +)
DNSTwist (20220131)
Requests (2.27)
BeautifulSoup (4 or +)
Gensim (4.2.0)
Scipy (1.6.2)

Notebooks du use case[modifier | modifier le wikicode]

Retrouvez tous les éléments du Use Case sur le GitLab du Campus Cyber : https://gitlab.com/campuscyber/gt-ia-et-cyber/-/tree/main/UC3%20Brand%20protection,%20fight%20against%20typo-squatting?ref_type=heads

Groupe de travail

IA et cybersécurité