scrapers module¶

Perform webscraping related to typosquatting on PyPI.

A module that contains any functions that can make internet calls to gather data related to typosquatting.

scrapers.get_all_packages(page='https://pypi.org/simple/')¶

Download simple list of PyPI package names.

pypi.org/simple conveniently lists all the names of current packages. This function scrapes that listing and then places the package names in a python list structure.

Parameters:	page (str) – webpage from which to download pypi package names
Returns:	package names on pypi
Return type:	list

scrapers.get_metadata(name)¶

Retrieve pypi package metadata for one package.

Retrieve via an internet call to PyPI via JSON metadata on a particular PyPI package and return this information.

Parameters:	name (str) – name of package on pypi for which to retrieve metadata
Returns:	package metadata
Return type:	dict

scrapers.get_top_packages(top_n=50, stored=False)¶

Identify top packages by download count on pypi.

A friendly person has already provided an occasionally updated JSON feed to enable this program to build a list of the top pypi packages by download count. The default does a fresh pull of this feed. If the user wants to use a stored list, that is possible if the user sets the stored flag to true.

Parameters:	top_n (int) – the number of top packages to retrieve stored (bool) – whether to use the stored package list
Returns:	top packages
Return type:	dict

scrapers module¶

pypi-scan

Navigation

Related Topics