python爬虫


1203 浏览 5 years, 6 months

3 HTTP操作

版权声明: 转载请注明出处 http://www.codingsoho.com/

httplib实现了HTTP和HTTPS的客户端协议,一般不直接用,python的更高层的模块urllib和urllib2分钟了http的实现

常用的URL处理模块包括urllib和urllib2

urllib和urllib2的差别是什么?一篇老外的文章《Python: difference between urllib and urllib2》是这么描述的

You might be intrigued by the existence of two separate URL modules in Python -urllib and urllib2. Even more intriguing: they are not alternatives for each other. So what is the difference between urllib and urllib2, and do we need them both?
urllib and urllib2are both Python modules that do URL request related stuff but offer different functionalities. Their two most significant differences are listed below:
 urllib2 can accept a Request object to set the headers for a URL request,urllib accepts only a URL. That means, you cannot masquerade your User Agent string etc.
 urllib provides the urlencode method which is used for the generation of GET query strings, urllib2 doesn't have such a function. This is one of the reasons why urllib is often used along with urllib2.
For other differences between urllib and urllib2 refer to their documentations, the links are given in the References section.
Tip: if you are planning to do HTTP stuff only, check out httplib2, it is much better than httplib or urllib or urllib2.

对于高层的HTTP客户端接口,推荐使用Requests package