scrapers module

Perform webscraping related to typosquatting on PyPI.

A module that contains any functions that can make internet calls to gather data related to typosquatting.

scrapers.get_all_packages(page='https://pypi.org/simple/')

Download simple list of PyPI package names.

pypi.org/simple conveniently lists all the names of current packages. This function scrapes that listing and then places the package names in a python list structure.

Parameters:page (str) – webpage from which to download pypi package names
Returns:package names on pypi
Return type:list
scrapers.get_metadata(name)

Retrieve pypi package metadata for one package.

Retrieve via an internet call to PyPI via JSON metadata on a particular PyPI package and return this information.

Parameters:name (str) – name of package on pypi for which to retrieve metadata
Returns:package metadata
Return type:dict
scrapers.get_top_packages(top_n=50, stored=False)

Identify top packages by download count on pypi.

A friendly person has already provided an occasionally updated JSON feed to enable this program to build a list of the top pypi packages by download count. The default does a fresh pull of this feed. If the user wants to use a stored list, that is possible if the user sets the stored flag to true.

Parameters:
  • top_n (int) – the number of top packages to retrieve
  • stored (bool) – whether to use the stored package list
Returns:

top packages

Return type:

dict