需求
专注app软件定制开发通过取得国家官网各地专注app软件定制开发区疫情风险等级,专注app软件定制开发存入电子表格最终如下:
数据来源
分析网页
-
页面结构
上部:有截至时间,有三个按键:高\中\低.点击可以切换中部的信息
中部:风险地区信息
下部:翻页 -
确定请求方式
因为有翻页按钮,猜想应该是用ajax更新数据
打开F12,点<网络>,清除一下之前的内容,刷新一下
-
分析请求信息
url=http://bmfw.www.gov.cn/bjww/interface/interfaceJson#请求方式为:POST#请求头headers={ Accept: application/json, text/plain, */* Accept-Encoding: gzip, deflate Accept-Language: zh-CN,zh;q=0.9 Connection: keep-alive Content-Length: 235 Content-Type: application/json;charset=UTF-8 Cookie: wdcid=57661336733ee69d; _gscu_1088464070=62382713kmnu2p11; __auc=5e75e3b61830dbb29f71ed50e8e; wdses=7e7e0e45f5b9f4e6; _gscbrs_1088464070=1; __asc=578b530b18312ed51c75133a6b5; acw_tc=2760823f16624698875982607ee72639a46b54881093f71cc7ab12b65f0a17; wdlast=1662469913; _gscs_1088464070=62469886ezpysq11|pv:2; SERVERID=edf8bc70025336506334b22603ae1cc6|1662469904|1662469877 Host: bmfw.www.gov.cn Origin: http://bmfw.www.gov.cn Referer: http://bmfw.www.gov.cn/yqfxdjcx/risk.html User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36 x-wif-nonce: QkjjtiLM2dCratiA x-wif-paasid: smt-application x-wif-signature: B80277094A20F7C04C735C8413BE2B014332114512BFB51BADD30E21D5C368D9 x-wif-timestamp: 1662469914}#POST请求的数据from_data={ "key":"3C502C97ABDA40D0A60FBEE50FAAD1DA", "appId":"NcApplication", "paasHeader":"zdww", "timestampHeader":"1662469914", "nonceHeader":"123456789abcdefg", "signatureHeader":"B0BF67E09448D9A8A6C0538B259E715FD51CB51FCD6822E85000C2196354EB0B"}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
大多数都是常规项目请求.重点分析:
x-wif-nonce
x-wif-paasid
x-wif-signature
x-wif-timestampytimestampHeader
signatureHeader -
确认请求头参数
先在<源代码 >右边的<XHR/提取断点>里输入接口的后半段地址:
/bjww/interface/interfaceJson,并刷新页面,他会断下来,然后看<作用或>,如果<作用域>里显示有"/bjww/interface/interfaceJson",就可以在里面查看headers等数据
上图可见headers只验证
Accept
Content-Type
x-wif-nonce
x-wif-paasid
x-wif-signature
x-wif-timestamp
使用的是POST请求,请求数据是字符串,字符串中是字典. -
分析JS
5.1 搜索x-wif-nonce等关键字
在<网络>下按Ctrl+F调出搜索窗口,搜x-wif-nonce
找到2个JS,3个地方
先点第1.2个JS:
这里已经注释丢了,肯定不是.
看第3个JS:
显然就是我们要找的:
javascript function p(t) { if (!t.commonHeaders) return t; var e = o.value , n = CryptoJS.SHA256(e + "fTN2pfuisxTavbTuYVSsNJHetwq5bJvCQkjjtiLM2dCratiA" + e).toString(CryptoJS.enc.Hex).toUpperCase(); return Object.assign(t.headers, { "x-wif-nonce": "QkjjtiLM2dCratiA", "x-wif-paasid": "smt-application", "x-wif-signature": n, "x-wif-timestamp": e }), t }
x-wif-nonce和x-wif-paasid是固定值,分别是:“QkjjtiLM2dCratiA"和"smt-application”
x-wif-signature和x-wif-timestamp是一个对象n,e,可以在这里下一个断点,查看他们:
e是一个时间戳n是SHA(e+"fTN2pfuisxTavbTuYVSsNJHetwq5bJvCQkjjtiLM2dCratiA" + e)
- 1
- 2
这里再进一步验证SHA256是否是标准算法,上图紫框是网页算出来的字串:
可见这里使用的是标准SHA256算法.
5.2搜索timestampHeader
在<网络>里搜索timestampHeader
timestampHeader为时间戳signatureHeader是SHA(timestampHeader+'23y0ufFl5YxIyGrI8hWRUZmKkvtSjLQA'+'123456789abcdefg'+timestampHeader)
- 1
- 2
编写代码
1.爬虫代码
import hashlibimport osimport requestsimport timeimport sysimport jsonimport csv#显示某等级地区的小计def show_level_count(x_list): j=0 for i in range(len(x_list)): j+=len (x_list[i]["communitys"]) print(j) return jdef writer_to_csv(risk_txt): risk_json = json.loads(risk_txt) so_far_time = risk_json["data"]["end_update_time"] highlist = risk_json["data"]["highlist"] middlelist = risk_json["data"]["middlelist"] lowlist = risk_json["data"]["lowlist"] encoding='utf_8_sig' f = open('risk_data_' + so_far_time + '.csv','w', encoding=encoding,newline='') csv_writer = csv.writer(f) level_dict={} level_dict["高风险"]=highlist level_dict["中风险"]=middlelist level_dict["低风险"]=lowlist for level in level_dict.keys(): risk_level = level for i in range(len(level_dict[level])): province = level_dict[level][i]["province"] city = level_dict[level][i]["city"] county = level_dict[level][i]["county"] for j in range(len(level_dict[level][i]["communitys"])): csv_writer.writerow( [risk_level, province, city, county, level_dict[level][i]["communitys"][j]]) # write_to_csv_file(csv_writer, highlist, "高风险") # write_to_csv_file(csv_writer, middlelist, "中风险") # write_to_csv_file(csv_writer, lowlist, "低风险") f.close() print("写入risk_data.csv完成.")def get_risk_area_data(): timestamp = str(int(time.time())) # timestamp = '1662646358' x_wif_timestamp = timestamp timestampHeader = timestamp x_wif_nonce = 'QkjjtiLM2dCratiA' x_wif_paasid = 'smt-application' x_wif_signature_str = timestamp + \ 'fTN2pfuisxTavbTuYVSsNJHetwq5bJvCQkjjtiLM2dCratiA'+timestamp x_wif_signature = hashlib.sha256( x_wif_signature_str.encode('utf-8')).hexdigest().upper() signatureHeader_str = timestamp + \ '23y0ufFl5YxIyGrI8hWRUZmKkvtSjLQA'+'123456789abcdefg'+timestamp signatureHeader = hashlib.sha256( signatureHeader_str.encode('utf-8')).hexdigest().upper() url = 'http://bmfw.www.gov.cn/bjww/interface/interfaceJson' headerss = { 'Accept': "application/json, text/plain, */*", 'Content-Type': "application/json;charset=utf-8", 'x-wif-nonce': "QkjjtiLM2dCratiA", 'x-wif-paasid': "smt-application", 'x-wif-signature': x_wif_signature, 'x-wif-timestamp': x_wif_timestamp, } From_data = "{\"key\":\"3C502C97ABDA40D0A60FBEE50FAAD1DA\",\ \"appId\":\"NcApplication\",\"paasHeader\":\"zdww\",\ \"timestampHeader\":\"" + timestampHeader + "\",\ \"nonceHeader\":\"123456789abcdefg\",\"signatureHeader\":\"" + signatureHeader + "\"}" # print(From_data) response = requests.post(url=url, data=From_data, headers=headerss) if not response.status_code == 200: # print(response.status_code) return "", response.status_code # print(response.text) return response.text.replace('•', ''), response.status_codeif __name__ == '__main__': risk_data=get_risk_area_data() if risk_data[1]==200: with open('./risk_data.json', 'w',encoding='utf-8') as f: f.write(risk_data[0]) print("写入risk_data.log完成.") f = open('risk_data.json', 'r', encoding='utf-8') risk_txt = f.read() f.close() writer_to_csv(risk_txt) print('全部程序完成,请勿频繁使用!') os.system('pause')
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112