V2EX = way to explore
V2EX 是一个关于分享和探索的地方
现在注册
已注册用户请  登录
推荐学习书目
Learn Python the Hard Way
Python Sites
PyPI - Python Package Index
http://diveintopython.org/toc/index.html
Pocoo
值得关注的项目
PyPy
Celery
Jinja2
Read the Docs
gevent
pyenv
virtualenv
Stackless Python
Beautiful Soup
结巴中文分词
Green Unicorn
Sentry
Shovel
Pyflakes
pytest
Python 编程
pep8 Checker
Styles
PEP 8
Google Python Style Guide
Code Style from The Hitchhiker's Guide
my8100
V2EX  ›  Python

ScrapydWeb 现已支持自定义 Run Spider 页面的 settings & arguments 默认值

  •  
  •   my8100 ·
    my8100 · 2019-06-22 16:58:35 +08:00 · 1841 次点击
    这是一个创建于 1999 天前的主题,其中的信息可能已经有所发展或是发生改变。

    1.安装更新:

    pip install -U git+https://github.com/my8100/scrapydweb.git
    

    2.如果之前已在使用 scrapydweb v1.2.0,则在已有的配置文件中添加如下配置选项:

    
    ############################## Run Spider #####################################
    # The default is False, set it to True to automatically
    # expand the 'settings & arguments' section in the Run Spider page.
    SCHEDULE_EXPAND_SETTINGS_ARGUMENTS = False
    
    # The default is 'Mozilla/5.0', set it a non-empty string to customize the default value of `custom`
    # in the drop-down list of `USER_AGENT`.
    SCHEDULE_CUSTOM_USER_AGENT = 'Mozilla/5.0'
    
    # The default is None, set it to any value of ['custom', 'Chrome', 'iPhone', 'iPad', 'Android']
    # to customize the default value of `USER_AGENT`.
    SCHEDULE_USER_AGENT = None
    
    # The default is None, set it to True or False to customize the default value of `ROBOTSTXT_OBEY`.
    SCHEDULE_ROBOTSTXT_OBEY = None
    
    # The default is None, set it to True or False to customize the default value of `COOKIES_ENABLED`.
    SCHEDULE_COOKIES_ENABLED = None
    
    # The default is None, set it to a non-negative integer to customize the default value of `CONCURRENT_REQUESTS`.
    SCHEDULE_CONCURRENT_REQUESTS = None
    
    # The default is None, set it to a non-negative number to customize the default value of `DOWNLOAD_DELAY`.
    SCHEDULE_DOWNLOAD_DELAY = None
    
    # The default is "-d setting=CLOSESPIDER_TIMEOUT=60\r\n-d setting=CLOSESPIDER_PAGECOUNT=10\r\n-d arg1=val1",
    # set it to '' or any non-empty string to customize the default value of `additional`.
    # Use '\r\n' as the line separator.
    SCHEDULE_ADDITIONAL = "-d setting=CLOSESPIDER_TIMEOUT=60\r\n-d setting=CLOSESPIDER_PAGECOUNT=10\r\n-d arg1=val1"
    
    

    3.GitHub

    https://github.com/my8100/scrapydweb

    目前尚无回复
    关于   ·   帮助文档   ·   博客   ·   API   ·   FAQ   ·   实用小工具   ·   5102 人在线   最高记录 6679   ·     Select Language
    创意工作者们的社区
    World is powered by solitude
    VERSION: 3.9.8.5 · 25ms · UTC 07:24 · PVG 15:24 · LAX 23:24 · JFK 02:24
    Developed with CodeLauncher
    ♥ Do have faith in what you're doing.