pythonurllibhttps

發布時間: 2025-07-28 03:48:36

❶ python的httplib，urllib和urllib2的區別及用

urllib和urllib2
urllib 和urllib2都是接受URL請求的相關模塊，但是urllib2可以接受一個Request類的實例來設置URL請求的headers，urllib僅可以接受URL。
這意味著，你不可以偽裝你的User Agent字元串等。
urllib提供urlencode方法用來GET查詢字元串的產生，而urllib2沒有。這是為何urllib常和urllib2一起使用的原因。
目前的大部分http請求都是通過urllib2來訪問的

httplib
httplib實現了HTTP和HTTPS的客戶端協議，一般不直接使用，在python更高層的封裝模塊中（urllib,urllib2）使用了它的http實現。

urllib簡單用法

urllib.urlopen(url[, data[, proxies]]) :
[python] view plain
google = urllib.urlopen('http://www.google.com')
print 'http header:/n', google.info()
print 'http status:', google.getcode()
print 'url:', google.geturl()
for line in google: # 就像在操作本地文件
print line,
google.close()

詳細使用方法見
urllib學習

urllib2簡單用法
最簡單的形式
[python] view plain
import urllib2
response=urllib2.urlopen('http://www.douban.com')
html=response.read()
實際步驟：

1、urllib2.Request()的功能是構造一個請求信息，返回的req就是一個構造好的請求
2、urllib2.urlopen()的功能是發送剛剛構造好的請求req，並返回一個文件類的對象response，包括了所有的返回信息。
3、通過response.read()可以讀取到response裡面的html，通過response.info()可以讀到一些額外的信息。

❷ Python 爬取https的登錄界面，怎麼爬取成功，謝謝

之前寫的一直沒成功，原因是用的不是HTTPS相關的函數。這次仔細研究了一下，有幾個需要注意的點，一個是POST模擬登陸的時候，header中的cookie值，不同的網站應該會有不同的要求；另一個是GET頁面的時候，是需要加上POST得到的response中的set-cookie的。這樣才能利用登陸的成功。

寫完POST和GET頁面後，順便寫了個簡單的命令行實現。

importhttplib,urllib
importurllib2
importcookielib
importsys

file_text="build_change.txt"
resultTable=dict()
host='buuuuuuu.knight.com'

defLogin(username,password,csrf=''):
url='/login/'
values={
'username':username,
'password':password,
'next':'',
'csrfmiddlewaretoken':csrf,
}

headers={
'User-Agent':'Mozilla/5.0(WindowsNT6.1;WOW64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/35.0.1916.114Safari/537.36',
'Content-Type':'application/x-www-form-urlencoded',
'Connection':'keep-alive',
'Cookie':'csrftoken=%s'%csrf,
'Referer':'https://buuuuuuu.knight.com/login/',
'Origin':'https://buuuuuuu.knight.com',
'Content-Type':'application/x-www-form-urlencoded',
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
}
values=urllib.urlencode(values)
conn=httplib.HTTPSConnection(host,443)
conn.request("POST",url,values,headers)
response=conn.getresponse()
print'Login:',response.status,response.reason
'''
hdata=response.getheaders()
foriinxrange(len(hdata)):
forjinxrange(len(hdata[i])):
printhdata[i][j],
print
'''
returnresponse.getheader("set-cookie")


defGetHtml(_url,cookie):
get_headers={
'Host':'xxxxx.knight.com',
'Connection':'keep-alive',
'Cache-Control':'max-age=0',
'Cookie':cookie,
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
'User-Agent':'Mozilla/5.0(WindowsNT6.1;WOW64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/35.0.1916.114Safari/537.36',
'Accept-Language':'zh-CN,zh;q=0.8,en;q=0.6',
}
conn=httplib.HTTPSConnection(host)
conn.request("GET",_url,None,get_headers)
res2=conn.getresponse()
print"Get%s:"%_url,res2.status,res2.reason
'''
hdata1=res2.getheaders()
foriinxrange(len(hdata1)):
forjinxrange(len(hdata1[i])):
printhdata1[i][j],
print
'''
data=res2.read()
fp=open("build_change.txt","w")
fp.write(data)
fp.close()


defParseHtml():
fp=open(file_text,"r")
content=fp.readline()
_pos=0
whilecontent:
ifcontent.find("class="change-body"")>=0:
topic=content.split(">")
resultTable[_pos]=topic[1]
whilecontent:
content=fp.readline()
resultTable[_pos]=resultTable[_pos]+content
ifcontent.find("</div>")>=0:
_pos=_pos+1
break
content=fp.readline()
fp.close()
print"Parsehtmlsuccess."


defGenerateResultTxt():
f=open("build_change_result.txt","w")
forminresultTable.keys():
f.write("-------------------------------------------------------------------------------------------
")
f.write(resultTable[m])
f.close()
print"Generateresultsuccess:build_change_result.txt."
defHelp():
print'-h:help'
print'-u:username(must)'
print'-p:password(must)'
print'-c:csrftoken(optional)'
print'-s:sandboxbuildid(must)'
print'Forexample:'
print'[1]pythonBuildChange.py-h'
print'[2]pythonBuildChang.py-uu-pp-ss1s2'
print'[3]pythonBuildChang.py-uu-pp-cc-ss1s2'


defParseParam(com):
length=len(com)
username=""
password=""
csrf=""
sid1=""
sid2=""
iflength==2orlength==8orlength==10:
ifcom[1]=='-h':
Help()
foriinrange(1,length):
ifcom[i]=='-u'andi<(length-1):
username=com[i+1]
i+=1
elifcom[i]=='-p'andi<(length-1):
password=com[i+1]
i+=1
elifcom[i]=='-c'andi<(length-1):
csrf=com[i+1]
i+=1
elifcom[i]=='-s'andi<(length-2):
sid1=com[i+1]
sid2=com[i+2]
i+=2
ifusername==""orpassword==""orsid1==""orsid2=="":
print'[Error]Parametererror!'
print'[Error]Youcanuse"pythonBuildChange.py-h"toseehowcanusethisscript.'
else:
ifcsrf=="":
cookie=Login(username,password)
else:
cookie=Login(username,password,csrf)
_url="//changelog//between//%s//and//%s/"%(sid1,sid2)
GetHtml(_url,cookie)
ParseHtml()
GenerateResultTxt()

#C:Python27python.exeC:UsersknightDesktopuildBuildChange.py-uxux-pKKKKKKKK-s18594091858525

if__name__=="__main__":
ParseParam(sys.argv)

❸ python3中使用urllib進行https請求

剛入門python學習網路爬蟲基礎，我使用的python版本是python3.6.4，學習的教程參考 Python爬蟲入門教程

python3.6的版本已經沒有urllib2這個庫了，所以我也不需要糾結urllib和urllib2的區別和應用場景

參考這篇官方文檔 HOWTO Fetch Internet Resources Using The urllib Package 。關於http(s)請求一般就get和post兩種方式較為常用，所以寫了以下兩個小demo，url鏈接隨便找的，具體場景具體變化，可參考注釋中的基本思路

POST請求：

GET請求：

注意，
使用ssl創建未經驗證的上下文，在urlopen中需傳入上下文參數
urllib.request.urlopen(full_url, context=context)
這是Python 升級到 2.7.9 之後引入的一個新特性，所以在使用urlopen打開https鏈接會遇到如下報錯：
ssl.SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:777)
所以，當使用urllib.urlopen打開一個 https 鏈接時，需要先驗證一次 SSL 證書
context = ssl._create_unverified_context()
或者或者導入ssl時關閉證書驗證
ssl._create_default_https_context =ssl._create_unverified_context

閱讀全文

熱點內容

世界壓縮機品牌發布：2025-07-28 07:21:06 瀏覽：128

四位訪問限制密碼發布：2025-07-28 07:19:33 瀏覽：138

安卓照片怎麼遷移蘋果軟體發布：2025-07-28 07:18:04 瀏覽：672

c語言程序設計教程pdf 發布：2025-07-28 07:09:18 瀏覽：447

廣東黨建雲伺服器出現故障發布：2025-07-28 06:52:09 瀏覽：651

我的世界伺服器啟用飛行指令發布：2025-07-28 06:49:35 瀏覽：78

ios數據傳輸加密發布：2025-07-28 06:09:16 瀏覽：767

百度androidsdk 發布：2025-07-28 05:59:00 瀏覽：972

我的世界值得玩的混亂伺服器發布：2025-07-28 05:38:33 瀏覽：350

怎麼上傳文件夾發布：2025-07-28 05:28:32 瀏覽：182

pythonurllibhttps

與pythonurllibhttps相關的資訊