python爬虫
1139 浏览 6 years, 2 months
3.2 urllib
版权声明: 转载请注明出处 http://www.codingsoho.com/urllib
https://docs.python.org/2.7/library/urllib.html
在Python3中,urllib分割成了几块,并重新命名为urllib.request, urllib.parse, urllib.error
Python3中的urllib.request.urlopen()功能等同于urlllib2.urlopen(),urllib.urlopen()被移除了
2to3能够自动适配到Python3
基本操作
GET
[http://www.healforce.com/cn/index.php](http://www.healforce.com/cn/index.php)?ac=search&at=result&lng=cn&keyword=P100&countnum=1
import urllib
params = urllib.urlencode({'ac': 'search','at': 'result','lng': 'cn','keyword': 'P100'})
f = urllib.urlopen("[http://www.healforce.com/cn/index.php](http://www.healforce.com/cn/index.php)?%s" % params)
print f.read()
POST
python2
import urllib
params = urllib.urlencode({'ac': 'search','at': 'result','lng': 'cn','keyword': 'EB03'})
f = urllib.urlopen("[http://www.healforce.com/cn/index.php](http://www.healforce.com/cn/index.php)?%s" % params)
print f.read()
使用HTTP代理
上面的操作Fiddler无法抓包,需要设置代理之后才能监控
import urllib
params = urllib.urlencode({'ac': 'search','at': 'result','lng': 'cn','keyword': 'P100'})
proxies = {'http' : '[127.0.0.1:8888](127.0.0.1:8888)'}
f = urllib.urlopen("[http://www.healforce.com/cn/index.php](http://www.healforce.com/cn/index.php)?%s" % params, proxies=proxies)
print f.read()
报错
IOError: [Errno url error] invalid proxy for http: '127.0.0.1:8888'
在这儿出错了
def open_unknown_proxy(self, proxy, fullurl, data=None):
"""Overridable interface to open unknown URL type."""
type, url = splittype(fullurl)
raise IOError, ('url error', 'invalid proxy for %s' % type, proxy)
opener
比如要用opener设置代理
import urllib
proxies = {'http': '[127.0.0.1:8888](127.0.0.1:8888)'}
opener = urllib.FancyURLopener(proxies)
f = urllib.urlopen("[http://www.healforce.com/cn/index.php](http://www.healforce.com/cn/index.php)")
print f.read()
但是如果使用下面方式访问的话,同样会报错
IOError: [Errno url error] invalid proxy for http: '127.0.0.1:8888'
import urllib
proxies = {'http': '[127.0.0.1:8888](127.0.0.1:8888)'}
opener = urllib.FancyURLopener(proxies)
params = urllib.urlencode({'ac': 'search','at': 'result','lng': 'cn','keyword': 'P100'})
f = opener.open("[http://www.healforce.com/cn/index.php](http://www.healforce.com/cn/index.php)?%s" % params)
这两个错误后面单独调试再看