반응형
Notice
Recent Posts
Recent Comments
Link
관리 메뉴

bro's coding

sklearn.feature_extraction.text.CountVectorizer.min_df변화 관찰 본문

[AI]/python.sklearn

sklearn.feature_extraction.text.CountVectorizer.min_df변화 관찰

givemebro 2020. 4. 28. 09:57
반응형

(속성(단어) 줄이기)

단어집에서 min_df 이하의 횟수 만큼 나온 단어들을 제거

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import BernoulliNB


num_of_words=[]
scores_BernoulliNB=[]
min_df=range(1,10)


for df in min_df:
    vect=CountVectorizer(min_df=df)
    vect.fit(text_train)
    num_of_words.append(len(vect.get_feature_names()))
    
    
    X_train=vect.transform(text_train)
    X_test=vect.transform(text_test)
    
    
    model=BernoulliNB()
    model.fit(X_train,y_train)
    
    
    scores_BernoulliNB.append(model.score(X_test,y_test))
    
# visualization    
import matplotlib.pyplot as plt


plt.subplot(1,2,1)
plt.plot(num_of_words,'b:o')
plt.xticks(min_df)


plt.subplot(1,2,2)
plt.plot(scores_BernoulliNB,'b:o')
plt.yticks(rotation=-50)

반응형
Comments