utils module¶
Perform actions related to typosquatting.
These are the important misfits. They don’t fit in elsewhere but these functions need to be in a module somewhere.
-
utils.
compare_metadata
(pkg1, pkg2)¶ Retrieve and compare metadata of two PyPI packages.
Determine whether the package metadata has no identical fields (i.e. no risk) or has at least one identical field (i.e. some risk). This function operates on the theory that typosquatting packages sometimes, perhaps often, borrow package metadata of the original package in order to trick unsuspecting users.
Parameters: - pkg1 (str) – name of first package to compare
- pkg2 (str) – name of second package to compare
Returns: a value of “no_risk” or “some_risk”
Return type: str
-
utils.
create_potential_squatter_names
(module_name)¶ Create a set of potential typosquatting names.
Given a module name, create a set of potential typosquatting names based on qwerty distance, a measure of how close keys are to each other. This is a more sophisticated measure of keyboard key distance than levenshtein distance.
Parameters: module_name (str) – a name for a module Returns: potential typosquatting name Return type: list
-
utils.
create_suspicious_package_dict
(all_packages, top_packages, max_distance=1)¶ Examine all top packages for typosquatters.
Loop through all top packages and check for instances of typosquatting. This includes confusion
Parameters: - all_packages (list) – all package names
- top_packages (list) – package names to perform comparison
- max_distance (int) – maximum edit distance to check for typosquatting
Returns: top packages (key) and potential typosquatters (value)
Return type: dict
-
utils.
load_most_recent_packages
(folder='package_lists')¶ Load the most recent package list from at least 24 hours ago.
Load the JSON file containing PyPI packages with the most recent timestamp that was created at least 24 hours ago.
Parameters: folder (str) – Folder in which to check for file Returns: Packages loaded from JSON file Return type: package_set (set)
-
utils.
print_suspicious_packages
(packages)¶ Pretty print a suspicious package list.
Packages with any identical metadata are printed in red while other potential typosquatters are printed in the normal ink color.
Parameters: packages (dict) – (key) package and (value) potential typosquatters
-
utils.
store_recent_scan_results
(packages, folder='package_lists')¶ Store results of scanning packages recently added to PyPI.
Save timestamped version of JSON file to allow analysis of packages recently added to PyPI
Parameters: - packages (list) – Packages on PyPI
- folder (str) – Folder in which to store JSON file
-
utils.
store_squatting_candidates
(squat_candidates)¶ Persist results of squatting candidate search.
Dump typosquatter candidate list to a json file. Store with time-stamped file name to results folder.
Parameters: squat_candidates (dict) – top packages and potential typosquatters