国产日韩欧美一区二区三区综合,日本黄色免费在线,国产精品麻豆欧美日韩ww,色综合狠狠操

極客小將

您現在的位置是:首頁 » python編程資訊

資訊內容

python學習requests爬取網頁圖片

極客小將2021-02-19-
簡介免費學習推薦:python視頻教程前言最近想做一個練練的小游戲給家里的小朋友玩兒,但是苦于選取素材,然后在一個巨佬的博客里找了靈感,就借用一下粉絲的頭像試試爬取網頁里的圖片吧!(感謝各位啦!)完成總目標:爬取粉絲頭像作為素材,完成一個連連看的小游戲故本文分為兩部分內容:1、爬取素材部分;2、利用素材

免費學習推薦:python視頻教程WfP少兒編程網-Scratch_Python_教程_免費兒童編程學習平臺

前言

??**近想做一個練練的小游戲給家里的小朋友玩兒,但是苦于選取素材,然后在一個巨佬的博客里找了靈感,就借用一下粉絲的頭像試試爬取網頁里的圖片吧!(感謝各位啦!)WfP少兒編程網-Scratch_Python_教程_免費兒童編程學習平臺

?
完成總目標:
??爬取粉絲頭像作為素材,完成一個連連看的小游戲
故本文分為兩部分內容:
1、爬取素材部分;
2、利用素材完成連連看小游戲部分(鏈接)WfP少兒編程網-Scratch_Python_教程_免費兒童編程學習平臺

(一)爬取粉絲頭像

實現目標:WfP少兒編程網-Scratch_Python_教程_免費兒童編程學習平臺

??通過爬蟲實現對粉絲頭像的爬取并順序排列存儲,作為游戲設計的素材,其中爬取的頭像一部分是使用了CSDN的默認頭像,存在重復情況,所以還需要去重以得到完整且不重復的圖像集WfP少兒編程網-Scratch_Python_教程_免費兒童編程學習平臺

一、準備WfP少兒編程網-Scratch_Python_教程_免費兒童編程學習平臺

1、python環境WfP少兒編程網-Scratch_Python_教程_免費兒童編程學習平臺

2、涉及到的python庫需要 pip install 包名 安裝WfP少兒編程網-Scratch_Python_教程_免費兒童編程學習平臺

二、代碼編寫WfP少兒編程網-Scratch_Python_教程_免費兒童編程學習平臺

1.爬取內容WfP少兒編程網-Scratch_Python_教程_免費兒童編程學習平臺

(1)所需要的庫WfP少兒編程網-Scratch_Python_教程_免費兒童編程學習平臺

import requestsimport json

(2)得到請求地址WfP少兒編程網-Scratch_Python_教程_免費兒童編程學習平臺

url = 'https://blog.csdn.net//phoenix/web/v1/fans/list?page=1&pageSize=40&blogUsername=weixin_45386875' #關注我的部分請求地址#url = 'https://blog.csdn.net//phoenix/web/v1/follow/list?page=1&pageSize=40&blogUsername=weixin_45386875' #我關注的部分請求地址

請求地址獲取方法:
右擊所要爬取部分頁面,點擊 審查元素,找到圖中文件


注: 頁面選到“TA的粉絲(13)”部分才能出現,如果點擊Network什么也沒有,則需要刷新頁面就會出現頁面內容了)
(3)帶上請求頭發送請求,做一些簡單偽裝WfP少兒編程網-Scratch_Python_教程_免費兒童編程學習平臺

header = { 'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36', 'Cookie' : 'uuid_tt_dd=10_30826311340-1612520858912-361156; Hm_ct_6bcd52f51e9b3dce32bec4a3997715ac=6525*1*10_30826311340-1612520858912-361156!5744*1*weixin_45386875!1788*1*PC_VC; UN=weixin_45386875; p_uid=U010000; ssxmod_itna=Qui=DKiI3hkbG=DXDnD+r8h9eD53ecxPPit5bP1ODlOaYxA5D8D6DQeGTbcW1AoWGATqFYKmEWiH5/gbO4FpjQGcxLbbYx0aDbqGkqnU40rDzuQGuD0KGRDD4GEDCHHDIxGUBuDeKDR8qDg7gQcCM=DbSdDKqDHR+4FbG4oD8PdS0p=C7Gt3AuQ3DmtSije3r424rQ+iPqWzPDA3DK6jiD==; ssxmod_itna2=Qui=DKiI3hkbG=DXDnD+r8h9eD53ecxPPit5bP1D66Ii40vah303pFcXW0D6QALW==0tWLGtzKPUA76xoU10vpqD6AoOqs1R=Db=3olozYp0wVxUS0r/GeZCqzVezFQc8dZon7efYcr=1nhNr6nWKcTqqaDQYcwYSA+hNaqem=WWuDuDQ/+1PGEsN=atvS7WDp07vFuFDherg0AP0KFw0ea6kcTtK2rh/fy=/De0n1FNk+ONYxCXr=QrdTj6gxCuNNWXvp1IDdl2Ckjc=N/cqV6SmHZIZIuOEqml=dHMroMFDvdMVr8afnyQ+sbGPCbt3xdD07tAdRD7uDQ0gT=Bh7OYblbtYQFDDLxD2tGDD===; UserName=weixin_45386875; UserInfo=9863b829527c49a3ba1622396deaa7d9; UserToken=9863b829527c49a3ba1622396deaa7d9; UserNick=ryc875327878; AU=01F; BT=1612846374580; Hm_up_6bcd52f51e9b3dce32bec4a3997715ac=%7B%22uid_%22%3A%7B%22value%22%3A%22weixin_45386875%22%2C%22scope%22%3A1%7D%2C%22islogin%22%3A%7B%22value%22%3A%221%22%2C%22scope%22%3A1%7D%2C%22isonline%22%3A%7B%22value%22%3A%221%22%2C%22scope%22%3A1%7D%2C%22isvip%22%3A%7B%22value%22%3A%220%22%2C%22scope%22%3A1%7D%7D; __gads=ID=94978f740e79c9e5-22c918ed05c600ea:T=1613266189:RT=1613266189:S=ALNI_Mbwb8ad5kdYjogF7yImerVAzKaJuQ; dc_session_id=10_1613272889543.735028; announcement-new=%7B%22isLogin%22%3Atrue%2C%22announcementUrl%22%3A%22https%3A%2F%2Fblog.csdn.net%2Fblogdevteam%2Farticle%2Fdetails%2F112280974%3Futm_source%3Dgonggao_0107%22%2C%22announcementCount%22%3A0%2C%22announcementExpire%22%3A3600000%7D; dc_sid=3784575ebe1e9d08a29b0e3fc3621328; c_first_ref=default; c_first_page=https%3A//www.csdn.net/; c_segment=4; Hm_lvt_6bcd52f51e9b3dce32bec4a3997715ac=1613222907,1613266055,1613268241,1613273899; TY_SESSION_ID=82f0aa61-9b28-49b2-a854-b18414426735; c_pref=; c_ref=https%3A//www.csdn.net/; c_page_id=default; dc_tos=qoi2fq; log_Id_pv=925; Hm_lpvt_6bcd52f51e9b3dce32bec4a3997715ac=1613274327; log_Id_view=905; log_Id_click=658' }

獲取請求頭的方法:


(4)向網頁發送請求WfP少兒編程網-Scratch_Python_教程_免費兒童編程學習平臺

try: data = requests.get(url,headers = header).text data_dist = json.loads(data) except: print('爬取失敗') exit ()

2.保存所需圖片并去重WfP少兒編程網-Scratch_Python_教程_免費兒童編程學習平臺

定義一些函數備用WfP少兒編程網-Scratch_Python_教程_免費兒童編程學習平臺

#保存文件def save_imag(file_name,img_url): request.urlretrieve(url=img_url,filename='D:\rycpython_learning\10_linkup\fan_avatar\'+file_name)#刪除一個文件夾下的所有所有文件def del_file(path): ls = os.listdir(path) for i in ls: c_path = os.path.join(path, i) if os.path.isdir(c_path):#如果是文件夾那么遞歸調用一下 del_file(c_path) else: #如果是一個文件那么直接刪除 os.remove(c_path) print ('文件已經清空完成')#圖像去重def compare_images(pic1,pic2): image1 = Image.open(pic1) image2 = Image.open(pic2) histogram1 = image1.histogram() histogram2 = image2.histogram() differ = math.sqrt(reduce(operator.add, list(map(lambda a,b: (a-b)**2,histogram1, histogram2)))/len(histogram1)) print('differ:',differ) if differ == 0: return 'same' else: return 'diff'#刪除指定位置的圖像def del_avatar(path): if os.path.exists(path): # 如果文件存在 os.remove(path) else: print('no such file:%s'%(path)) # 則返回文件不存在#先清空一下文件夾 del_file('D:\rycpython_learning\10_linkup\fan_avatar') index = 0 # i 是爬取列表的索引號; index 是保存的圖片的索引號 for i in range(0,len(fan_list)): fan_username = fan_list[i]['nickname'] #print('fans_user%s:'%(i+1),fan_username) fan_avatar_url = fan_list[i]['userAvatar'] #print('fans_avatar_url%s:'%(i+1),fan_avatar_url) save_imag('fans_avatar%s.jpg'%(index+1),fan_avatar_url) #print('----------------save_image--fans_avatar%s.jpg'%(index+1)) #圖片去重 for j in range(0,index): if index != j : comp_res = compare_images('./fan_avatar/fans_avatar%s.jpg'%(index+1),'./fan_avatar/fans_avatar%s.jpg'%(j+1)) print('--------compare_images:--------'+'./fan_avatar/fans_avatar%s.jpg'%(index+1) + '------with---' + './fan_avatar/fans_avatar%s.jpg'%(j+1)) print('comp_res:',comp_res) if comp_res == 'same': del_avatar('D:\rycpython_learning\10_linkup\fan_avatar\fans_avatar%s.jpg'%(index+1)) print('D:\rycpython_learning\10_linkup\fan_avatar\fans_avatar%s.jpg'%(index+1)) index = index - 1 break index = index + 1

3、調用WfP少兒編程網-Scratch_Python_教程_免費兒童編程學習平臺

if __name__ == "__main__": spider_fanavatar()

三、完整代碼WfP少兒編程網-Scratch_Python_教程_免費兒童編程學習平臺

# 爬取網頁圖片import requestsfrom urllib import requestimport jsonfrom PIL import Imageimport osimport mathimport operatorfrom functools import reduce#保存文件def save_imag(file_name,img_url): request.urlretrieve(url=img_url,filename='D:\rycpython_learning\10_linkup\fan_avatar\'+file_name)#爬取粉絲的頭像def spider_fanavatar(): url = 'https://blog.csdn.net//phoenix/web/v1/fans/list?page=1&pageSize=40&blogUsername=weixin_45386875' #關注我的部分請求地址 #url = 'https://blog.csdn.net//phoenix/web/v1/follow/list?page=1&pageSize=40&blogUsername=weixin_45386875' #我關注的部分請求地址 header = { 'User-Agent' : 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36', 'Cookie' : 'uuid_tt_dd=10_30826311340-1612520858912-361156; Hm_ct_6bcd52f51e9b3dce32bec4a3997715ac=6525*1*10_30826311340-1612520858912-361156!5744*1*weixin_45386875!1788*1*PC_VC; UN=weixin_45386875; p_uid=U010000; ssxmod_itna=Qui=DKiI3hkbG=DXDnD+r8h9eD53ecxPPit5bP1ODlOaYxA5D8D6DQeGTbcW1AoWGATqFYKmEWiH5/gbO4FpjQGcxLbbYx0aDbqGkqnU40rDzuQGuD0KGRDD4GEDCHHDIxGUBuDeKDR8qDg7gQcCM=DbSdDKqDHR+4FbG4oD8PdS0p=C7Gt3AuQ3DmtSije3r424rQ+iPqWzPDA3DK6jiD==; ssxmod_itna2=Qui=DKiI3hkbG=DXDnD+r8h9eD53ecxPPit5bP1D66Ii40vah303pFcXW0D6QALW==0tWLGtzKPUA76xoU10vpqD6AoOqs1R=Db=3olozYp0wVxUS0r/GeZCqzVezFQc8dZon7efYcr=1nhNr6nWKcTqqaDQYcwYSA+hNaqem=WWuDuDQ/+1PGEsN=atvS7WDp07vFuFDherg0AP0KFw0ea6kcTtK2rh/fy=/De0n1FNk+ONYxCXr=QrdTj6gxCuNNWXvp1IDdl2Ckjc=N/cqV6SmHZIZIuOEqml=dHMroMFDvdMVr8afnyQ+sbGPCbt3xdD07tAdRD7uDQ0gT=Bh7OYblbtYQFDDLxD2tGDD===; UserName=weixin_45386875; UserInfo=9863b829527c49a3ba1622396deaa7d9; UserToken=9863b829527c49a3ba1622396deaa7d9; UserNick=ryc875327878; AU=01F; BT=1612846374580; Hm_up_6bcd52f51e9b3dce32bec4a3997715ac=%7B%22uid_%22%3A%7B%22value%22%3A%22weixin_45386875%22%2C%22scope%22%3A1%7D%2C%22islogin%22%3A%7B%22value%22%3A%221%22%2C%22scope%22%3A1%7D%2C%22isonline%22%3A%7B%22value%22%3A%221%22%2C%22scope%22%3A1%7D%2C%22isvip%22%3A%7B%22value%22%3A%220%22%2C%22scope%22%3A1%7D%7D; __gads=ID=94978f740e79c9e5-22c918ed05c600ea:T=1613266189:RT=1613266189:S=ALNI_Mbwb8ad5kdYjogF7yImerVAzKaJuQ; dc_session_id=10_1613272889543.735028; announcement-new=%7B%22isLogin%22%3Atrue%2C%22announcementUrl%22%3A%22https%3A%2F%2Fblog.csdn.net%2Fblogdevteam%2Farticle%2Fdetails%2F112280974%3Futm_source%3Dgonggao_0107%22%2C%22announcementCount%22%3A0%2C%22announcementExpire%22%3A3600000%7D; dc_sid=3784575ebe1e9d08a29b0e3fc3621328; c_first_ref=default; c_first_page=https%3A//www.csdn.net/; c_segment=4; Hm_lvt_6bcd52f51e9b3dce32bec4a3997715ac=1613222907,1613266055,1613268241,1613273899; TY_SESSION_ID=82f0aa61-9b28-49b2-a854-b18414426735; c_pref=; c_ref=https%3A//www.csdn.net/; c_page_id=default; dc_tos=qoi2fq; log_Id_pv=925; Hm_lpvt_6bcd52f51e9b3dce32bec4a3997715ac=1613274327; log_Id_view=905; log_Id_click=658' } try: data = requests.get(url,headers = header).text #得到返回的字符串 data_dist = json.loads(data) #將字符串轉為字典格式 except: print('爬取失敗') exit () fan_list = data_dist['data']['list'] #提取所需內容 #先清空一下文件夾 del_file('D:\rycpython_learning\10_linkup\fan_avatar') index = 0 # i 是爬取列表的索引號; index 是保存的圖片的索引號 for i in range(0,len(fan_list)): fan_username = fan_list[i]['nickname'] #print('fans_user%s:'%(i+1),fan_username) fan_avatar_url = fan_list[i]['userAvatar'] #print('fans_avatar_url%s:'%(i+1),fan_avatar_url) save_imag('fans_avatar%s.jpg'%(index+1),fan_avatar_url) #print('----------------save_image--fans_avatar%s.jpg'%(index+1)) #圖片去重 for j in range(0,index): if index != j : comp_res = compare_images('./fan_avatar/fans_avatar%s.jpg'%(index+1),'./fan_avatar/fans_avatar%s.jpg'%(j+1)) print('--------compare_images:--------'+'./fan_avatar/fans_avatar%s.jpg'%(index+1) + '------with---' + './fan_avatar/fans_avatar%s.jpg'%(j+1)) print('comp_res:',comp_res) if comp_res == 'same': del_avatar('D:\rycpython_learning\10_linkup\fan_avatar\fans_avatar%s.jpg'%(index+1)) print('D:\rycpython_learning\10_linkup\fan_avatar\fans_avatar%s.jpg'%(index+1)) index = index - 1 break index = index + 1 #圖像去重def compare_images(pic1,pic2): image1 = Image.open(pic1) image2 = Image.open(pic2) histogram1 = image1.histogram() histogram2 = image2.histogram() differ = math.sqrt(reduce(operator.add, list(map(lambda a,b: (a-b)**2,histogram1, histogram2)))/len(histogram1)) print('differ:',differ) if differ == 0: return 'same' else: return 'diff' #刪除指定位置的圖像def del_avatar(path): if os.path.exists(path): # 如果文件存在 os.remove(path) else: print('no such file:%s'%(path)) # 則返回文件不存在#刪除一個文件夾下的所有所有文件def del_file(path): ls = os.listdir(path) for i in ls: c_path = os.path.join(path, i) if os.path.isdir(c_path):#如果是文件夾那么遞歸調用一下 del_file(c_path) else: #如果是一個文件那么直接刪除 os.remove(c_path) print ('文件已經清空完成')if __name__ == "__main__": spider_fanavatar()

**后WfP少兒編程網-Scratch_Python_教程_免費兒童編程學習平臺

第一部分內容就到這里,第二部分內容將在下一篇文章完成,感興趣的小伙伴可以關注我,然后去看下一片文章哦!WfP少兒編程網-Scratch_Python_教程_免費兒童編程學習平臺

都讀到這里了,各位親愛的讀者留下你們寶貴的贊和評論吧,這將是我繼續前進的堅定動力!!!WfP少兒編程網-Scratch_Python_教程_免費兒童編程學習平臺

大量免費學習推薦,敬請訪問python教程(視頻)WfP少兒編程網-Scratch_Python_教程_免費兒童編程學習平臺

以上就是python學習requests爬取網頁圖片的詳細內容,更多請關注少兒編程網其它相關文章!WfP少兒編程網-Scratch_Python_教程_免費兒童編程學習平臺

預約試聽課

已有385人預約都是免費的,你也試試吧...

主站蜘蛛池模板: 遂昌县| 盐亭县| 宝鸡市| 周至县| 德保县| 平和县| 芦山县| 克什克腾旗| 晋州市| 泰宁县| 应城市| 环江| 莲花县| 观塘区| 孝义市| 无棣县| 长沙市| 尉氏县| 沈阳市| 呼玛县| 康马县| 米脂县| 台中县| 博客| 新津县| 崇仁县| 和硕县| 奇台县| 南召县| 岗巴县| 巢湖市| 舒兰市| 时尚| 怀远县| 泉州市| 怀化市| 浦北县| 万荣县| 太和县| 镶黄旗| 库车县|