반응형
Notice
Recent Posts
Recent Comments
Link
관리 메뉴

bro's coding

sklearn.TfidfVectorizer(tokenizer=twitter_tag.morphs).LogisticRegression 본문

[AI]/python.sklearn

sklearn.TfidfVectorizer(tokenizer=twitter_tag.morphs).LogisticRegression

givemebro 2020. 4. 29. 12:47
반응형
from konlpy.tag import Twitter, Okt
from sklearn.feature_extraction.text import TfidfVectorizer

 

# data 준비
tfidf=TfidfVectorizer(tokenizer=twitter_tag.morphs,min_df=3)
X_train=tfidf.fit_transform(text_train)
X_test=tfidf.transform(text_test)

 

# model
from sklearn.linear_model import LogisticRegression
model=LogisticRegression()
model.fit(X_train,y_train)
model.score(X_test,y_test)

# 가중치
w=model.coef_[0]

# 가중치가 작은 순의 index (20개)
small=np.argsort(w)[:20]

# 가중치가 큰 순의 index (20개)
big=np.argsort(w)[-20:]

# small+big
small_big=np.r_[small,big]

# small_big에 대한 단어 찾기
fn=np.array(tfidf.get_feature_names())
small_big_name=fn[small_big]

 

import matplotlib.pyplot as plt

# 한국어 적용
from matplotlib import font_manager, rc
font_name=font_manager.FontProperties(fname="C:/Windows/Fonts/HMFMPYUN.TTF").get_name()
rc('font',family=font_name)

# visualization
plt.figure(figsize=[20,20])
plt.bar(range(40),w[small_big])
plt.xticks(range(40),small_big_name,rotation=90)
pass

반응형
Comments