算命縖子
生命不息,折腾不止。
© 算命縖子
All Rights Reserved.

爬抓360补天厂商的域名

360补天这几天搞公益安全月的活动,奖励颇为丰富。

就是爬抓这些厂商的url。

查看源码,里面压根就没有看到这些厂商的内容。

于是用burp抓包分析。

返回数据中的company_id就是那个厂商的id了。

打开url就能看到链接

https://butian.360.cn/Loo/submit?cid=59567

我的思路是这样的,先获取每页的返回内容放到文件里。然后在用正则提取出company_id的内容。


最后在每个id在访问一遍用正则提取出url的内容。

#!/usr/bin/env python
# -*- conding:utf-8 -*-
from urllib import request,parse

url='https://butian.360.cn/Home/Active/company'

data = {
        'type': '1',
        'p': '1'}

headers={
'Host': 'butian.360.cn',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:62.0) Gecko/20100101 Firefox/62.0',
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
'Referer': 'https://butian.360.cn/Home/Active/hd.html',
'Content-Type': 'application/x-www-form''-urlencoded; charset=UTF-8',
'X-Requested-With': 'XMLHttpRequest',
'Content-Length': '10',
'Cookie': 'xxxxxx',
'Connection': 'close'
}

with open('360s.txt','a') as f:
    for i in range(1, 32):
        data['p'] = "%s" % (i)
        print(data)
        datas = parse.urlencode(data).encode('utf-8')
        req = request.Request(url, headers=headers, data=datas)
        page = request.urlopen(req).read()
        page = page.decode('utf-8')
        f.write(page)
        print(page)

爬到第9页后就出错了,比如是11就爬第一页的内容,第25就是第二页的内容。。

这就很神奇了,于是抓包分析分析。

发现是

Content-Length: 10
这里的问题,改成
Content-Length: 11
即可爬10-后面的内容

发现是但是改成11不能访问1-9页的内容,可神奇了。

爬抓完后,下面就是用re正则来匹配出每个厂商的id了。

res = re.compile(r'"company_id":"(.*?)"')
rea = res.findall(data)

这样即可

接下来就是挨个访问。

url='https://butian.360.cn/Loo/submit?cid='

headers={
'Host': 'butian.360.cn',
'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:62.0) Gecko/20100101 Firefox/62.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
'Cookie': 'xxxxxxx',
'Connection': 'close',
'Upgrade-Insecure-Requests': '1'
}
with open('360url.txt','a') as f1:
    with open('360errs.txt','a') as f0:
        with open('360.txt','r') as f:
            for i in f.readlines():
                urls = url+i
                urlre = urllib.request.Request(urls,headers=headers)
                try:
                    urlop = urllib.request.urlopen(urlre).read()
                    page = urlop.decode('utf-8')
                    res = re.compile(r'<li><span>所属域名:</span><input class="input-xlarge" type="text" name="host" placeholder="请输入厂商域名" value="(.*?)" />')
                    rea = res.findall(page)
                    print(rea)
                    f1.write(rea[0]+'\n')
                    time.sleep(2)
                except:
                    print(urls+"-------------->错误")
                    f0.write(i+'\n')

头文件还不能是刚刚那个的。

由于访问过快的话,就会出错,于是我就sleep了2秒访问一次。

最后幸不辱命。最终还是完成了。

可群内的大佬说主域名没得用。

没得办法。。。

大佬们嫌弃我爬的url 只能继续想怎么获取二级域名了。

我记得我前段时间在吐司里面写了个调用百度的接口来获取二级域名。

我只需去用那个脚本改改就好。

def domain():
    with open('360data.txt', 'a') as f0:
        with open('360url.txt', 'r') as f1:
            for url in f1.readlines():
                print(url)
                urls = "http://ce.baidu.com/index/getRelatedSites?site_address="+url
                opurl = urllib.request.urlopen(urls)
                data = opurl.read().decode('utf-8')
                f0.write(data)
                print(data)

这个是获取二级域名。不过数据还是要处理下。

with open('360data.txt','r') as f:
    pattern = re.compile(r'":"(.*?)","')
    result = pattern.findall(f.readline())
    for i in result:
        with open("360url2.txt","a") as f:
            f.write("%s\n"%(i))
        print(i)

最后得到的内容还是不错的。

反正我又不想挖,所以我弄的一切只为了装逼。装完就跑……

2018-11-03
767 views
  1. Aly Chiman

    Hello there,

    My name is Aly and I would like to know if you would have any interest to have your website here at http://www.nmd5.com promoted as a resource on our blog alychidesign.com ?

    We are in the midst of updating our broken link resources to include current and up to date resources for our readers. Our resource links are manually approved allowing us to mark a link as a do-follow link as well
    .
    If you may be interested please in being included as a resource on our blog, please let me know.

    Thanks,
    Aly

发表评论