Distributed Frontera: Large-scale web crawling framework¶
Frontera is a crawl frontier implementation of a web crawler. It’s managing when and what to crawl next, checking for crawling goal accomplishment. Distributed Frontera is extension to Frontera providing replication, sharding and isolation of all parts of Frontera-based crawler to scale and distribute it. Both these packages contain components to allow creation of fully-operational web crawler with Scrapy.
- Redefined default values
- Built-in settings