Job Search Engine
1026 浏览 5 years, 8 months
2.14 用户触发数据爬虫定制 (1) - 用户数据的输入和显示
版权声明: 转载请注明出处 http://www.codingsoho.com/用户触发数据爬虫定制 (1) - 用户数据的输入和显示
研究一下猎聘搜索关键字
下面是最简单搜索:关键字 software
https://www.liepin.com/zhaopin/?industries=&subIndustry=&dqs=&salary=&jobKind=&pubTime=&compkind=&compscale=&industryType=&searchType=1&clean_condition=&isAnalysis=&init=1&sortFlag=15&flushckid=0&fromSearchBtn=1&headckid=181ea51414511b11&d_headId=297e843e68fa1dc1781b19720ffee3dd&d_ckId=297e843e68fa1dc1781b19720ffee3dd&d_sfrom=search_fp_nvbar&d_curPage=0&d_pageSize=40&siTag=1B2M2Y8AsgTpgAmY7PhCfg~fA9rXquZc5IkJpXC-Ycixw&key=software
[https://www.liepin.com/zhaopin/](https://www.liepin.com/zhaopin/)?
industries=
subIndustry=
dqs=
salary=
jobKind=
pubTime=
compkind=
compscale=
industryType=
searchType=1
clean_condition=
isAnalysis=
init=1
sortFlag=15
flushckid=0
fromSearchBtn=1
headckid=181ea51414511b11
d_headId=297e843e68fa1dc1781b19720ffee3dd
d_ckId=297e843e68fa1dc1781b19720ffee3dd
d_sfrom=search_fp_nvbar
d_curPage=0
d_pageSize=40
siTag=1B2M2Y8AsgTpgAmY7PhCfg~fA9rXquZc5IkJpXC-Ycixw
key=software
全部选项搜素
第一页链接 :
https://www.liepin.com/zhaopin/?isAnalysis=&dqs=020010010&pubTime=7&salary=50$100&subIndustry=001004&industryType=industry_01&compscale=030&key=software&init=-1&searchType=1&headckid=82952d5916bbe9a0&flushckid=1&compkind=020&fromSearchBtn=2&sortFlag=15&ckid=54979e566e902845&jobKind=2&industries=010&clean_condition=&siTag=-foQupVsrPkdeHiGETnvuQ~pDiv2mpWvb7HnzSpHT_esw&d_sfrom=search_prime&d_ckId=bd38bd9ff2e5fb9d18c83d9c7bd719bd&d_curPage=0&d_pageSize=40&d_headId=a85a1b9b951a94a0b3e0a15495800d6f
[https://www.liepin.com/zhaopin/](https://www.liepin.com/zhaopin/)?
isAnalysis=
dqs=020010010
pubTime=7
salary=50$100
subIndustry=001004
industryType=industry_01
compscale=030
key=software
init=-1
searchType=1
headckid=82952d5916bbe9a0
flushckid=1
compkind=020
fromSearchBtn=2
sortFlag=15
ckid=54979e566e902845
jobKind=2
industries=010
clean_condition=
siTag=-foQupVsrPkdeHiGETnvuQ~pDiv2mpWvb7HnzSpHT_esw
d_sfrom=search_prime
d_ckId=bd38bd9ff2e5fb9d18c83d9c7bd719bd
d_curPage=0
d_pageSize=40
d_headId=a85a1b9b951a94a0b3e0a15495800d6f
第二页链接:
https://www.liepin.com/zhaopin/?isAnalysis=&dqs=020010010&pubTime=7&salary=50%24100&subIndustry=001004&industryType=industry_01&compscale=030&key=software&init=-1&searchType=1&headckid=82952d5916bbe9a0&compkind=020&fromSearchBtn=2&sortFlag=15&ckid=f4cf188239c69b4e°radeFlag=1&jobKind=2&industries=010&clean_condition=&siTag=-foQupVsrPkdeHiGETnvuQ~2GEH_2GU37EZKEJJnBMLPg&d_sfrom=search_prime&d_ckId=970b4ac9a9280ef771b4f820be64bb63&d_curPage=0&d_pageSize=40&d_headId=a85a1b9b951a94a0b3e0a15495800d6f&curPage=1
最后一页链接:
https://www.liepin.com/zhaopin/?isAnalysis=&dqs=020010010&pubTime=7&salary=50%24100&subIndustry=001004&industryType=industry_01&compscale=030&key=software&init=-1&searchType=1&headckid=82952d5916bbe9a0&compkind=020&fromSearchBtn=2&sortFlag=15&ckid=f4cf188239c69b4e°radeFlag=1&jobKind=2&industries=010&clean_condition=&siTag=-foQupVsrPkdeHiGETnvuQ~2GEH_2GU37EZKEJJnBMLPg&d_sfrom=search_prime&d_ckId=970b4ac9a9280ef771b4f820be64bb63&d_curPage=1&d_pageSize=40&d_headId=a85a1b9b951a94a0b3e0a15495800d6f&curPage=24
分析下来各个query关键字的含义如下:
init=1 是否只是基本搜索,如果用了高级搜素,则为-1
key=software 搜索关键字
industryType= 行业
industries= 细分
subIndustry = ?
dqs= 城市 / 地区
salary= 薪资
jobKind=职位类型
pubTime=发布时间
compkind= 企业性质
compscale=企业规模
searchType=1
clean_condition=
flushckid=0
fromSearchBtn=1 从上面搜索,2 从下一页
sortFlag=15
ckid=
d_ckId=297e843e68fa1dc1781b19720ffee3dd
headckid=181ea51414511b11
d_headId=297e843e68fa1dc1781b19720ffee3dd
siTag=1B2M2Y8AsgTpgAmY7PhCfg~fA9rXquZc5IkJpXC-Ycixw
d_sfrom=search_fp_nvbar, search_prime
d_curPage=0
d_pageSize=40
isAnalysis=
创建表单
最初的计划是可以填写所有的搜索字段,后来发现猎聘做过转换,所以换个思路,直接输入超链接进去,根据超链接搜索。所以只有两个字段,邮箱地址和超链接地址
from django import forms
from django.utils.translation import ugettext_lazy as _
class JobScrapRegisterForm(forms.Form):
mail = forms.EmailField(label=_('mail'), required=True)
# keyword = forms.CharField(label=_('keyword'), required=False)
# industry = forms.CharField(label=_('industry'), required=False)
# region = forms.CharField(label=_('region'), required=False)
# salary = forms.CharField(label=_('salary'), required=False)
# published = forms.CharField(label=_('published'), required=False)
# position_type = forms.CharField(label=_('position type'), required=False)
# company_size = forms.CharField(label=_('company size'), required=False)
# company_type = forms.CharField(label=_('company type'), required=False)
link = forms.CharField(label=_('href'), widget=forms.Textarea, required=True)
视图
表单视图
这一节我们引入表单视图,它从FormView继承而来
授权混入类 Mixin
class LoginRequiredMixin(object):
@classmethod
def as_view(self, *args, **kwargs):
view = super(LoginRequiredMixin, self).as_view(*args, **kwargs)
return login_required(view)
@method_decorator(login_required)
def dispatch(self, request, *args, **kwargs):
return super(LoginRequiredMixin, self).dispatch(request, *args, **kwargs)
class JobScrapRegisterView(LoginRequiredMixin, FormView):
链接的解析和构建
def parse_lielin_search_link(href):
keywords_extracted = {}
keywords = href.split("?")
if len(keywords) == 1:
return keywords_extracted
keywords = keywords[1].split("&")
for _ in keywords:
try:
keyword = _.split("=")
if len(keyword) == 1:
keywords_extracted[keyword[0]] = ''
else:
keywords_extracted[keyword[0]] = keyword[1]
except:
pass
return keywords_extracted
def build_liepiin_search_link(keywords_dict):
base_url = "[https://www.liepin.com/zhaopin/](https://www.liepin.com/zhaopin/)"
fields = [
"industryType",
"jobKind",
"sortFlag",
"degradeFlag",
"industries",
"salary",
"compscale",
"key",
"clean_condition",
"headckid",
"d_pageSize",
"siTag",
"d_headId",
"d_ckId",
"d_sfrom",
"d_curPage"
]
url = base_url + "?"
for field in fields:
url += (field + "=" + keywords_dict.get(field, "") + "&")
url += ("curPage" + "0")
return url
完整代码
class JobScrapRegisterView(LoginRequiredMixin, FormView):
form_class = JobScrapRegisterForm
template_name = 'jse/job_scrap_register_form.html'
def post(self, *args, **kwargs):
# form = self.form_class(self.request.POST or None, self.request.FILES or None)
form = self.get_form()
if form.is_valid():
href = form.cleaned_data.get('link', None)
if href:
# print parse_lielin_search_link(href)
context = self.get_context_data()
href_dict = parse_lielin_search_link(href)
build_href = build_liepiin_search_link(href_dict)
context['href_dict'] = href_dict
context['build_href'] = build_href
return self.render_to_response(context)
return HttpResponseRedirect(self.get_success_url()) # not necessary, leave it to base class
else:
return self.form_invalid(form)
return super(JobScrapRegisterView, self).post(*args, **kwargs)
def get_success_url(self, *args, **kwargs):
return self.request.get_full_path()
首先分析表单内容,如果有效,则解析数据,完成后一般会跳转到success_url,这儿我希望它停留在当前页面,所以使用self.render_to_response
渲染当前URL
模板
两部分内容,左边是表单,右边是显示的解析字典结果
jse/job_scrap_register_form.html
{% extends "base.html" %}
{% load staticfiles %}
{% load crispy_forms_tags %}
{% block content %}
<div class="row">
<div class="col-sm-6">
<form method="post" action="">{% csrf_token %}
{{form|crispy}}
<input type="submit" name="submit" class="btn btn-primary">
</form>
</div>
<div class="col-sm-6">
<table class="table table-bordered">
<caption>Job Search Query</caption>
<thead>
<tr>
<th>Key</th>
<th>Value</th>
</tr>
</thead>
<tbody>
{% for key, value in href_dict.items %}
<tr>
<td>{{key}}</td>
<td>{{value}}</td>
</tr>
{% endfor %}
</tbody>
</table>
<a href="{{build_href}}">生成链接</a>
</div>
</div>
{% endblock %}