2015년 7월 15일 수요일

단어빈도사전으로 워드클라우드 그리기

Posted on 7/15/2015 12:13:00 오전 by 마크 with No comments

요약

미리 준비한 단어 빈도 사전을 통해서, 워드클라우드를 그린다.
참고 출처 : http://spartanideas.msu.edu/2014/11/28/turn-your-twitter-timeline-into-a-word-cloud-using-python/

본문

In [1]:

# 노트에 그래프를 출력하기 위해서,
%pylab inline

Populating the interactive namespace from numpy and matplotlib

1. 단어 빈도 사전에서 특정 관련 단어 가지고 오기

몇 가지 패턴으로 뽑아낸 명사구에서 "자전거"가 포함된 빈도 사전을 가지고 작업

In [2]:

# 라이브러리 로딩
from sqlalchemy import create_engine
import pandas as pd

# 디비 조회
conn_info = 'postgresql://user:password@host:port/database'
e = create_engine(conn_info)
q = 'SELECT * FROM pandas.word_freq_bike;'
df = pd.read_sql(q,e)

# 빈도수가 높은 상위 150개의 자료만 뽑기
words = df.sort('freq',ascending=False).head(150)[ ['word','freq'] ].values

빈도 사전 구조

In [3]:

words[0:5]

Out[3]:

array([['자전거', 9956],
       ['자전거도로', 1008],
       ['자전거전용도로', 313],
       ['자전거길', 284],
       ['전기자전거', 239]], dtype=object)

2. 단어 빈도 사전으로 워드클라우드(WordCloud) 그리기

In [4]:

from wordcloud import WordCloud, STOPWORDS

apple_mask = imread('bikelogo.gif', 0)

figure(figsize=(12,8))
wordcloud = WordCloud(font_path='fonts/baedal.ttf',
                      stopwords=STOPWORDS,
                      background_color='white',
                      width=1800,
                      height=1400,
                      mask=apple_mask
            ).generate_from_frequencies(words)
 
imshow(wordcloud)
axis("off")
show()

Out[4]:

M@D+R

데이터가 미래다

페이지

2015년 7월 15일 수요일

단어빈도사전으로 워드클라우드 그리기

요약

본문

1. 단어 빈도 사전에서 특정 관련 단어 가지고 오기

2. 단어 빈도 사전으로 워드클라우드(WordCloud) 그리기

0 개의 댓글:

댓글 쓰기

태그

보관