본문 바로가기
IT/golang

go언어 이미지 수집기(crawler) 소스코드 공유

by 어느해겨울 2021. 12. 23.

 

golang/go언어 이미지 crawler, 이미지 수집기 만들기

 

이번 테스트 대상은 https://hani.co.kr이다. 딱히 이유는 없고 수업에 썼던 샘플 URL이라.. 

원하는 대상 URL로 사용하도록 하자.

 

package main

import (
	"fmt"
	"log"
	"net/http"

	"github.com/PuerkitoBio/goquery"
)

func main() {
	response, err := http.Get("https://www.hani.co.kr")
	if err != nil {
		log.Fatal(err)
	}
	defer response.Body.Close()
	document, err := goquery.NewDocumentFromReader(response.Body)
	if err != nil {
		log.Fatal("Error loading HTTP response body. ", err)
	}

	document.Find("img").Each(func(index int, element *goquery.Selection) {
		imgSrc, exists := element.Attr("src")
		if exists {
			fmt.Println(imgSrc)
		}
	})
}

 

golang의 가장 큰 특징으로 github에 있는 go module을 바로 import가 가능하다는 것이다.

github 검색을 하거나 golang 공식홈페이지에서 사용하고 싶은 module을 검색하면 공식 모듈과 애지 간한 github 모듈을 찾을 수 있다.

import "github.com/PuerkitoBio/goquery"

그리고 실행하기 전 타겟을 바꾸려면 다음 라인의 정보를 수정하자.

response, err := http.Get("대상 URL 입력")

마지막으로 꼭 이미지만 수집할 필요는 없다.

document.Find("img")처럼 http GET을 통해 응답한 데이터 중 img 태그 정보만 추출하여 나열하는 것이기 때문에 img 외 특정 키워드, 문장, 태그 등을 지정할 수 있다.

 

빌드를 하여 바이너리를 실행하여도 좋고 디버그를 통해 실행하여도 좋다. 어쨌거나 실행한 결과는 다음과 같다.

https://img.hani.co.kr/section-image/21/news/notion-logo.png
//flexible.img.hani.co.kr/flexible/normal/212/127/imgdb/child/2021/1208/53_16389166381133_20211207503702.jpg
//flexible.img.hani.co.kr/flexible/normal/212/127/imgdb/child/2021/1124/53_16377053591944_20211123503952.jpg
//flexible.img.hani.co.kr/flexible/normal/212/127/imgdb/child/2021/1222/53_16401607516553_20211222503100.jpg
//flexible.img.hani.co.kr/flexible/normal/212/127/imgdb/child/2021/1222/53_16401603573818_20211222503096.jpg
//flexible.img.hani.co.kr/flexible/normal/212/127/imgdb/child/2021/1222/53_16401763037188_20211222503701.jpg
//flexible.img.hani.co.kr/flexible/normal/212/127/imgdb/child/2021/1222/53_16401511736092_20211222501979.jpg
//flexible.img.hani.co.kr/flexible/normal/212/127/imgdb/child/2021/1222/53_16401782028595_20211222503735.jpg
//flexible.img.hani.co.kr/flexible/normal/212/127/imgdb/child/2021/1222/53_16401767510812_20211222503719.jpg
//flexible.img.hani.co.kr/flexible/normal/212/127/imgdb/child/2021/1222/53_16401409898816_20211222501428.jpg
//flexible.img.hani.co.kr/flexible/normal/212/112/imgdb/original/2021/1222/20211222503006.jpg
//flexible.img.hani.co.kr/flexible/normal/212/127/imgdb/child/2021/1217/53_16397303258139_20211217502227.jpg
//flexible.img.hani.co.kr/flexible/normal/212/127/imgdb/child/2021/1214/53_16394827423338_20211214503565.jpg
//flexible.img.hani.co.kr/flexible/normal/970/388/imgdb/child/2021/1222/53_16401356226909_20211222500762.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1221/53_16400503470659_20211221500694.jpg
//flexible.img.hani.co.kr/flexible/normal/836/334/imgdb/child/2021/1222/52_16401742184284_20211222503096.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401767510812_20211222503719.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401590509436_20211222502847.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401705875707_20211222503623.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401763037188_20211222503701.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401566178198_20211222502643.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401662580804_20211222502723.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401669209199_20211222502677.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401736790375_20211222503655.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401765302945_20211222503714.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401554293592_20211222502536.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1221/53_16400406666801_20211220503524.jpg
//flexible.img.hani.co.kr/flexible/normal/294/176/imgdb/original/2021/1214/5916394613984935.gif
https://flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401574506931_6016401574290488.jpg
https://flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/original/2021/1221/20211221503370.jpg
https://flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401565264144_6016399610369826.jpg
https://flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401576306545_2316401576052245.jpg
https://flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/original/2021/1221/20211221501347.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1217/53_16397179100396_20211217501340.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401721512409_20211222503006.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401782028595_20211222503735.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401655456254_20211222503461.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401760827055_20211222503703.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401639640108_20211222503241.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401727638749_20211222503647.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401712704585_20211222503621.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401615444054_20211222503185.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401753258511_20211222503688.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401384710036_20211221503860.jpg
//flexible.img.hani.co.kr/flexible/normal/499/299/imgdb/child/2021/1222/53_16401493850164_20211222501892.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401572785478_20211222502711.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401641467382_20211222503365.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401764520397_4316401764075122.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1220/53_16399757369142_20210415503616.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401657250972_20211222503466.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/original/2021/1222/20211222503172.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401686093609_20211222503600.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401585257901_20211222502826.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401620115834_20211222503189.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401600110412_20211222503016.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401588112946_20211222502876.jpg
//flexible.img.hani.co.kr/flexible/normal/393/236/imgdb/child/2021/1222/53_16401603990006_20211222503018.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401562213464_20211222502653.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401623363302_20211222503299.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401314787851_20211221503850.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401549760488_20211222502467.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401604003786_20211222503094.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401596431154_20211222502987.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401593486041_20211222502957.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401365550495_20211222500973.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401475631272_20211222501759.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401409898816_20211222501428.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401588780462_20211222502909.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401553140731_20211222502557.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401574432729_20211222502689.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401250007957_20211221503868.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401570481976_20211222502692.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401562603024_20211222502624.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401538164233_20211222502306.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401530740412_20211222502270.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401537808539_20211222502330.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401361522795_20211222500877.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401422774027_20211222501542.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401511736092_20211222501979.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/original/2021/1222/20211222501968.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401395328007_20211222501280.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/original/2021/1222/20211222501610.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401468562721_20211222501712.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401323670008_20211221503856.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401386659937_20211222501123.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401458633727_20211222501671.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401418785063_20211222501525.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401463356765_20211222501574.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401390765183_20211222501163.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401512915321_20211222501993.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401567680495_20211222502717.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401579061171_20211222502681.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401588771361_20211222502898.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401652021341_20211222503409.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401664952083_20211222503530.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401553213109_20211222502549.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/original/2021/1129/8616381659054946.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401662428757_20211222503519.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401356226909_20211222500762.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1221/53_16400797343823_20211221503477.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1221/53_16400680057589_20211221502291.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401404114562_20211222501351.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401431711112_20211222501540.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401416619468_20211222501497.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401345530413_20211221503848.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1221/53_16400594247204_20211221501611.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1221/53_16400888996278_20211221503656.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401676544652_20211222503542.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401654917432_20211222501610.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401673102182_20211222503480.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401479493569_20211222501775.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401687074537_20211222503569.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1214/53_16394615903336_20211214502089.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401673702951_20211222503425.jpg
https://flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/original/2021/1001/3216330571618195.png
https://flexible.img.hani.co.kr/flexible/normal/300/180/imgdb/original/2021/0805/4716281450449311.png
https://flexible.img.hani.co.kr/flexible/normal/300/180/imgdb/original/2021/0104/2116097157957689.jpg
https://flexible.img.hani.co.kr/flexible/normal/300/180/imgdb/original/2021/0104/9116097157957951.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401554293592_20211222502536.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401696598208_20211222503614.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401667013271_20211222503428.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401390765183_20211222501163.jpg
//flexible.img.hani.co.kr/flexible/normal/500/300/imgdb/child/2021/1222/53_16401574432729_20211222502689.jpg
//flexible.img.hani.co.kr/flexible/normal/300/120/imgdb/original/2021/0104/3916097157958204.jpg
//flexible.img.hani.co.kr/flexible/normal/300/180/imgdb/original/2021/0104/7616097157957099.jpg
//flexible.img.hani.co.kr/flexible/normal/300/180/imgdb/resize/2016/1029/1477644319_147764428402_20161029.JPG
https://flexible.img.hani.co.kr/flexible/normal/970/582/imgdb/child/2021/1221/53_16400956239873_5616400955996482.jpg
https://flexible.img.hani.co.kr/flexible/normal/970/582/imgdb/child/2021/1222/53_16401404114562_20211222501351.jpg
https://flexible.img.hani.co.kr/flexible/normal/970/582/imgdb/child/2021/1217/53_16397002490077_20211216504129.jpg
https://flexible.img.hani.co.kr/flexible/normal/725/435/imgdb/child/2021/1222/53_16401730145465_20211222503658.jpg
https://flexible.img.hani.co.kr/flexible/normal/970/582/imgdb/child/2021/1222/53_16401356226909_20211222500762.jpg
https://flexible.img.hani.co.kr/flexible/normal/900/540/imgdb/original/2021/1217/20211217502625.jpg
https://flexible.img.hani.co.kr/flexible/normal/970/582/imgdb/child/2021/1217/53_16397012601072_5516397007044683.jpg
https://flexible.img.hani.co.kr/flexible/normal/970/582/imgdb/child/2021/1220/53_16399955190665_20211220502943.jpg
https://img.hani.co.kr/section-image/15/hani/images/common/footer_logo.png

주석이 달린 이미지 파일이 많지만 URL을 정상적으로 따왔다는 건 이 프로그램이 정상적으로 잘 동작했다는 것이다.

 

본인은 php, perl, python을 통해 crawler 를 다수 만들었지만 golang처럼 쉽고 간결한데 파워풀하게 정리되는 게 또 있을까 라는 생각이 든다.

 

 

댓글