utils module

Perform actions related to typosquatting.

These are the important misfits. They don’t fit in elsewhere but these functions need to be in a module somewhere.

utils.compare_metadata(pkg1, pkg2)

Retrieve and compare metadata of two PyPI packages.

Determine whether the package metadata has no identical fields (i.e. no risk) or has at least one identical field (i.e. some risk). This function operates on the theory that typosquatting packages sometimes, perhaps often, borrow package metadata of the original package in order to trick unsuspecting users.

Parameters:
  • pkg1 (str) – name of first package to compare
  • pkg2 (str) – name of second package to compare
Returns:

a value of “no_risk” or “some_risk”

Return type:

str

utils.create_potential_squatter_names(module_name)

Create a set of potential typosquatting names.

Given a module name, create a set of potential typosquatting names based on qwerty distance, a measure of how close keys are to each other. This is a more sophisticated measure of keyboard key distance than levenshtein distance.

Parameters:module_name (str) – a name for a module
Returns:potential typosquatting name
Return type:list
utils.create_suspicious_package_dict(all_packages, top_packages, max_distance=1)

Examine all top packages for typosquatters.

Loop through all top packages and check for instances of typosquatting. This includes confusion

Parameters:
  • all_packages (list) – all package names
  • top_packages (list) – package names to perform comparison
  • max_distance (int) – maximum edit distance to check for typosquatting
Returns:

top packages (key) and potential typosquatters (value)

Return type:

dict

utils.load_most_recent_packages(folder='package_lists')

Load the most recent package list from at least 24 hours ago.

Load the JSON file containing PyPI packages with the most recent timestamp that was created at least 24 hours ago.

Parameters:folder (str) – Folder in which to check for file
Returns:Packages loaded from JSON file
Return type:package_set (set)
utils.print_suspicious_packages(packages)

Pretty print a suspicious package list.

Packages with any identical metadata are printed in red while other potential typosquatters are printed in the normal ink color.

Parameters:packages (dict) – (key) package and (value) potential typosquatters
utils.store_recent_scan_results(packages, folder='package_lists')

Store results of scanning packages recently added to PyPI.

Save timestamped version of JSON file to allow analysis of packages recently added to PyPI

Parameters:
  • packages (list) – Packages on PyPI
  • folder (str) – Folder in which to store JSON file
utils.store_squatting_candidates(squat_candidates)

Persist results of squatting candidate search.

Dump typosquatter candidate list to a json file. Store with time-stamped file name to results folder.

Parameters:squat_candidates (dict) – top packages and potential typosquatters