scrapers module¶
Perform webscraping related to typosquatting on PyPI.
A module that contains any functions that can make internet calls to gather data related to typosquatting.
-
scrapers.
get_all_packages
(page='https://pypi.org/simple/')¶ Download simple list of PyPI package names.
pypi.org/simple conveniently lists all the names of current packages. This function scrapes that listing and then places the package names in a python list structure.
Parameters: page (str) – webpage from which to download pypi package names Returns: package names on pypi Return type: list
-
scrapers.
get_metadata
(name)¶ Retrieve pypi package metadata for one package.
Retrieve via an internet call to PyPI via JSON metadata on a particular PyPI package and return this information.
Parameters: name (str) – name of package on pypi for which to retrieve metadata Returns: package metadata Return type: dict
-
scrapers.
get_top_packages
(top_n=50, stored=False)¶ Identify top packages by download count on pypi.
A friendly person has already provided an occasionally updated JSON feed to enable this program to build a list of the top pypi packages by download count. The default does a fresh pull of this feed. If the user wants to use a stored list, that is possible if the user sets the stored flag to true.
Parameters: - top_n (int) – the number of top packages to retrieve
- stored (bool) – whether to use the stored package list
Returns: top packages
Return type: dict