crm开发定制Python|使用 scrapy 框架爬取山东各城市天气预报-巴掌软件-高性价比的软件定制开发解决方案

实验内容：
安装 Python 扩展库，然后编写爬虫项目，从网站 http://www.weather.com.cn/shandong/index.shtml crm开发定制爬取山东各城市的天气预报数据，crm开发定制并把爬取到的天气数据crm开发定制写入本地文本 weather.txt。
实验步骤：

crm开发定制在命令提示符环境使用 pip install scrapy 命令安装 Python 扩展库 scrapy。 2. 在命令提示符环境使用 scrapy startproject sdWeatherSpider 创建爬虫项目。 3. 进入爬虫项目文件夹，然后执行命令 scrapy genspider everyCityinSD.py www.weather.com.cn 创建爬虫程序。 4. 使用浏览器打开网址 http://www.weather.com.cn/shandong/index.shtml，找到下面位置
实验步骤：
在命令提示符环境使用 pip install scrapy 命令安装 Python 扩展库 scrapy。
在命令提示符环境使用 scrapy startproject sdWeatherSpider 创建爬虫项目。
进入爬虫项目文件夹，然后执行命令 scrapy genspider everyCityinSD.py www.weather.com.cn 创建爬虫程序。
使用浏览器打开网址 http://www.weather.com.cn/shandong/index.shtml，找到下面位置
5.在页面上单击鼠标右键，选择“查看网页源代码”，然后找到与“城市预报列表”对应的位置。

6.选择并打开山东省内任意城市的天气预报页面，此处以烟台为例。

7.在页面上单击鼠标右键，选择“查看网页源代码”，找到与上图中天气预报相对应的位置。

8.修改items.py文件，定义要爬取的内容。

import scrapyclass SdweatherspiderItem(scrapy.Item):       #definethefieldsforyouritemherelike:       #name=scrapy.Field()       city=scrapy.Field()       weather=scrapy.Field()1
2
3
4
5
6

修改爬虫文件 everyCityinSD.py，定义如何爬取内容，其中用到的规则参考前面对页面的分析，如果无法正常运行，有可能是网页结构有变化，可以回到前面的步骤重新分析网页源代码。

from re import findall from urllib.request import urlopen import scrapy from sdWeatherSpider.items import SdweatherspiderItemclass EverycityinsdSpider(scrapy.Spider):    name = 'everyCityinSD'     allowed_domains = ['www.weather.com.cn']     start_urls = []     # 遍历各城市，获取要爬取的页面     URL url = r'http://www.weather.com.cn/shandong/index.shtml'     with urlopen(url) as fp:         contents = fp.read().decode()     pattern = '<a title=".*?" href="(.+?)" target="_blank">(.+?)</a>'     for url in findall(pattern, contents):         start_urls.append(url[0])     def parse(self, response):     # 处理每个城市的天气预报页面数据     item = SdweatherspiderItem()     city = response.xpath('//div[@class="crumbs fl"]//a[2]//text()').extract()[0]     item['city'] = city     # 每个页面只有一个城市的天气数据，直接取[0]     selector = response.xpath('//ul[@class="t clearfix"]')[0] # 存放天气数据     weather = ''     for li in selector.xpath('./li'):         date = li.xpath('./h1//text()').extract()[0]         cloud = li.xpath('./p[@title]//text()').extract()[0]         high = li.xpath('./p[@class="tem"]//span//text()').extract()[0]         low = li.xpath('./p[@class="tem"]//i//text()').extract()[0]         wind = li.xpath('./p[@class="win"]//em//span[1]/@title').extract()[0]         wind = wind + li.xpath('./p[@class="win"]//i//text()').extract()[0]         weather = weather + date+':'+cloud+','+high+r'/'+low+','+wind+''      item['weather'] = weather      return [item]1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33

10. 修改 pipelines.py 文件，把爬取到的数据写入文件 weather.txt。1

class SdweatherspiderPipeline(object):         def process_item(self, item, spider):         with open('weather.txt', 'a', encoding='utf8') as fp:            fp.write(item['city']+'')            fp.write(item['weather']+'\')         return item1
2
3
4
5
6

修改 settings.py 文件，分派任务，指定处理数据的程序。

BOT_NAME = 'sdWeatherSpider'SPIDER_MODULES = ['sdWeatherSpider.spiders'] NEWSPIDER_MODULE = 'sdWeatherSpider.spiders'ITEM_PIPELINES = { 'sdWeatherSpider.pipelines.SdweatherspiderPipeline':1, }1
2
3
4

切换到命令提示符环境，执行 scrapy crawl everyCityinSD 命令运行爬虫程序。