C:\scrapy(project folder)\scrapy.cfg 在系統中;
~/.config/scrapy.cfg ($XDG_CONFIG_HOME) and ~/.scrapy.cfg ($HOME) ,這些是全域性設定
SCRAPY_SETTINGS_MODULE
SCRAPY_PROJECT
SCRAPY_PYTHON_SHELL
scrapy.cfg - Deploy the configuration file project_name/ - Name of the project _init_.py items.py - It is project's items file pipelines.py - It is project's pipelines file settings.py - It is project's settings file spiders - It is the spiders directory _init_.py spider_name.py . . .
[settings] default = [name of the project].settings [deploy] #url = http://localhost:6800/ project = [name of the project]
Scrapy X.Y - no active project Usage: scrapy [options] [arguments] Available commands: crawl It puts spider (handle the URL) to work for crawling data fetch It fetches the response from the given URL
scrapy startproject scrapy_project
cd scrapy_project
scrapy genspider mydomain tw511.com
scrapy -h
fetch: 它使用Scrapy downloader 提取的 URL。
runspider: 它用於而無需建立一個專案執行自行包含蜘蛛(spider)。
settings: 它規定了專案的設定值。
shell: 這是一個給定URL的一個互動式模組。
startproject: 它建立了一個新的 Scrapy 專案。
version: 它顯示Scrapy版本。
view: 它使用Scrapy downloader 提取 URL並顯示在瀏覽器中的內容。
crawl: 它是用來使用蜘蛛抓取資料;
check: 它檢查專案並由 crawl 命令返回;
list: 它顯示本專案中可用蜘蛛(spider)的列表;
edit: 可以通過編輯器編輯蜘蛛;
parse:它通過蜘蛛分析給定的URL;
bench: 它是用來快速執行基準測試(基準講述每分鐘可被Scrapy抓取的頁面數量)。
COMMANDS_MODULE = 'mycmd.commands'
from setuptools import setup, find_packages setup(name='scrapy-module_demo', entry_points={ 'scrapy.commands': [ 'cmd_demo=my_module.commands:CmdDemo', ], }, )