반응형
Notice
Recent Posts
Recent Comments
Link
관리 메뉴

bro's coding

sklearn.feature_extraction.text.CountVectorizer.max_df변화 관찰 본문

[AI]/python.sklearn

sklearn.feature_extraction.text.CountVectorizer.max_df변화 관찰

givemebro 2020. 4. 28. 10:25
반응형

불용어 적용(stop words)
관사 지시 대명사 등등 관용적으로 사용하는 단어들 where when the it etc...

max_df(너무 많이 나오는 애들)(비율)
stop_words : 불용어 목록을 지정함

from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import BernoulliNB


num_of_words=[]
scores_BernoulliNB=[]
max_df=np.arange(0.1,1,0.1)


for df in max_df:
    vect=CountVectorizer(max_df=df)
    vect.fit(text_train)
    num_of_words.append(len(vect.get_feature_names()))
    X_train=vect.transform(text_train)
    X_test=vect.transform(text_test)
    
    model=BernoulliNB()
    model.fit(X_train,y_train)
    scores_BernoulliNB.append(model.score(X_test,y_test))
    
    
import matplotlib.pyplot as plt
plt.subplot(1,2,1)
plt.plot(num_of_words,'b:o')
plt.xticks(min_df)
plt.subplot(1,2,2)
plt.plot(scores_BernoulliNB,'b:o')
plt.yticks(rotation=-50)

반응형
Comments