前面我们提到了Python 爬虫 抓取豆瓣小组图片 通过api提交入库到 Chevereto 图床,后来闲着无聊又使用Golang写了一个脚本,用来抓取豆瓣小组的图片。
Chevereto free版本 使用api 上传图片 图文教程
图床地址:http://788to.com
使用之前大家先配置一下Golang的环境,然后安装一下必要的包:
go get github.com/PuerkitoBio/goquery
脚本运行时可以使用两个参数:
-u 小组的url地址,例如:https://www.douban.com/group/meituikong/discussion?start=
-e 最后一些的start=的值
-k?Chevereto密匙
完整的运行示例:
go run get-douban-image.go -u=”https://www.douban.com/group/265201/discussion?start=” -e=”700″ -k=”laoji.org”
git 地址:https://github.com/qsbaq/doubanImage
源码如下,以下代码仅作演示,以git地址代码为准:
package main
import (
"encoding/json"
"flag"
"fmt"
"io/ioutil"
"log"
"net/http"
"net/url"
"regexp"
"strconv"
"sync"
"time"
"github.com/PuerkitoBio/goquery"
)
func GetUrl(url string) []byte {
ret, err := http.Get(url)
if err != nil {
log.Println(url)
}
body := ret.Body
data, _ := ioutil.ReadAll(body)
return data
}
func getImage(image_url string, k string) {
data := GetUrl(image_url)
body := string(data)
part := regexp.MustCompile("https://(.*).doubanio.com/view/group_topic/large/public/(.*).jpg")
match := part.FindAllString(body, -1)
for _, value := range match {
submit_url := "http://788to.com/api/1/upload/?key=" + k + "&source=" + url.QueryEscape(value)
fmt.Println(submit_url)
return_json := GetUrl(submit_url)
res := make(map[string]interface{})
json.Unmarshal(return_json, &res)
log.Printf("%s -> %v \n", value, res["status_code"])
}
}
func getGroupList(target_url string, k string) {
fmt.Printf("Begin Url : %s\n", target_url)
doc, err := goquery.NewDocument(target_url)
if err != nil {
panic(err)
log.Fatal(err)
}
// Find the review items
doc.Find("td.title a").Each(func(i int, s *goquery.Selection) {
// For each item found, get the band and title
href, IsExist := s.Attr("href")
if IsExist {
getImage(href, k)
}
})
wg.Done()
}
var wg sync.WaitGroup
func main() {
k := flag.String("k", "laoji.org", "Chevereto Key")
endStartInt := flag.Int("e", 100, "End Start Int Value")
defaultUrl := flag.String("u", "https://www.douban.com/group/meituikong/discussion?start=", "Group Url")
flag.Parse()
for i := 0; i < *endStartInt; i = i + 25 {
wg.Add(1)
go getGroupList(*defaultUrl+strconv.Itoa(i), *k)
time.Sleep(3e9)
}
wg.Wait()
}
运行结果:
2017/02/10 08:18:10 https://img3.doubanio.com/view/group_topic/large/public/p615
41380.jpg -> 200
2017/02/10 08:18:10 https://img3.doubanio.com/view/group_topic/large/public/p447
24331.jpg -> 200
2017/02/10 08:18:10 https://img3.doubanio.com/view/group_topic/large/public/p655
69545.jpg -> 200
2017/02/10 08:18:10 https://img1.doubanio.com/view/group_topic/large/public/p447
24327.jpg -> 200
Begin Url : https://www.douban.com/group/265201/discussion?start=500
2017/02/10 08:18:10 https://img3.doubanio.com/view/group_topic/large/public/p470
29205.jpg -> 200
2017/02/10 08:18:10 https://img5.doubanio.com/view/group_topic/large/public/p336
82186.jpg -> 200
2017/02/10 08:18:11 https://img3.doubanio.com/view/group_topic/large/public/p649
79344.jpg -> 200
2017/02/10 08:18:11 https://img5.doubanio.com/view/group_topic/large/public/p470
29206.jpg -> 200
2017/02/10 08:18:11 https://img3.doubanio.com/view/group_topic/large/public/p649
79345.jpg -> 200
2017/02/10 08:18:11 https://img3.doubanio.com/view/group_topic/large/public/p487
17685.jpg -> 200
2017/02/10 08:18:11 https://img3.doubanio.com/view/group_topic/large/public/p507
72901.jpg -> 200
2017/02/10 08:18:11 https://img1.doubanio.com/view/group_topic/large/public/p452
23799.jpg -> 200
2017/02/10 08:18:11 https://img1.doubanio.com/view/group_topic/large/public/p477
58309.jpg -> 200