'[AI]/python.sklearn' 카테고리의 글 목록

Notice

Recent Posts

Recent Comments

Link

« 2025/08 »
일	월	화	수	목	금	토
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

Tags more

Archives

Today

Total

관리 메뉴

목록[AI]/python.sklearn (95)

bro's coding

활성함수를 사용하는 이유

신경망을 통한 계산이 선형적인데, 활성함수를 사용하면 비선형적이게 바꿀 수 있기 때문에 활성함수를 사용한다.

[AI]/python.sklearn 2020. 7. 3. 17:15

sklearn.TfidfVectorizer(tokenizer=twitter_tag.morphs).LogisticRegression

from konlpy.tag import Twitter, Okt from sklearn.feature_extraction.text import TfidfVectorizer # data 준비 tfidf=TfidfVectorizer(tokenizer=twitter_tag.morphs,min_df=3) X_train=tfidf.fit_transform(text_train) X_test=tfidf.transform(text_test) # model from sklearn.linear_model import LogisticRegression model=LogisticRegression() model.fit(X_train,y_train) model.score(X_test,y_test) # 가중치 w=model.co..

[AI]/python.sklearn 2020. 4. 29. 12:47

sklearn.decomposition.LatentDirichletAllocation

문서 군집화(토픽 모델링) 토픽 모델링: 비지도 학습으로 문서를 토픽으로 할당하는 작업 LDA(잠재 디리클레 할당, Latent Dirichlet Allocation) : 문서들이 가지는 단어들의 성분을 구한다(PCA와 유사) imdb_train, imdb_test = np.load('imdb.npy') text_train = [s.decode().replace(' ', '') for s in imdb_train.data] y_train = imdb_train.target from sklearn.feature_extraction.text import CountVectorizer vect = CountVectorizer(max_features=10000, max_df=0.15) X = vect.fit_tr..

[AI]/python.sklearn 2020. 4. 28. 17:58

sklearn.feature_extraction.text.CountVectorizer.ngram.LogisticRegression.2단어들만 출력

from sklearn.feature_extraction.text import CountVectorizer vect=CountVectorizer(ngram_range=(1,2)) X_train=vect.fit_transform(text_train) from sklearn.model_selection import cross_val_score from sklearn.linear_model import LogisticRegression model=LogisticRegression() model.fit(X_train,y_train) # 2개의 단어로 구성된 feature 추출 fn=np.array(vect.get_feature_names()) mask=np.array([s.find(' ')>=0 for s in..

[AI]/python.sklearn 2020. 4. 28. 16:24

sklearn.feature_extraction.text.CountVectorizer.ngram_range적용

ngram : n개의 단어로 만든 단어집 ex) s='I am Tam' 2gram : [I am], [am Tam] from sklearn.feature_extraction.text import CountVectorizer vect=CountVectorizer(ngram_range=(1,2)) X_train=vect.fit_transform(text_train) len(vect.get_feature_names()) 1522634 # 간략하게 2단어로 이뤄진 word 확인 #1 # count=0 # for key,value in vect.vocabulary_.items(): # # value=vect.vocabulary_[key] # if len(key.split())>1: # print(key,value..

[AI]/python.sklearn 2020. 4. 28. 15:23

sklearn.feature_extraction.text.TfidfTransformer.LogisticRegression적용

from sklearn.feature_extraction.text import TfidfVectorizer vect=TfidfVectorizer(min_df=5) vect.fit(text_train) X_train=vect.transform(text_train) from sklearn.linear_model import LogisticRegression model=LogisticRegression() model.fit(X_train,y_train) X_test=vect.transform(text_test) display(model.score(X_test,y_test),model.coef_) w=model.coef_[0] index_small=np.argsort(w)[:20] index_big=np.arg..

[AI]/python.sklearn 2020. 4. 28. 12:54

sklearn.feature_extraction.text.TfidfTransformer

https://ko.wikipedia.org/wiki/Tf-idf tf-idf - 위키백과, 우리 모두의 백과사전 위키백과, 우리 모두의 백과사전. TF-IDF(Term Frequency - Inverse Document Frequency)는 정보 검색과 텍스트 마이닝에서 이용하는 가중치로, 여러 문서로 이루어진 문서군이 있을 때 어떤 단어가 특정 문서 내에서 얼마나 중요한 것인지를 나타내는 통계적 수치이다. 문서의 핵심어를 추출하거나, 검색 엔진에서 검색 결과의 순위를 결정하거나, 문서들 사이의 비슷한 정도를 구하는 등의 용도로 사용할 수 있다. TF(단어 빈도, ko.wikipedia.org # tf-idf적용 # 단어빈도 _역문서빈도 적용 # 어떤 단어가 한 문서에서 많이 나온다. 그 이유가 있을것..

[AI]/python.sklearn 2020. 4. 28. 11:22

sklearn.feature_extraction.text.CountVectorizer.stop_words적용

# Stop_words 적용 from sklearn.feature_extraction.text import CountVectorizer from sklearn.naive_bayes import BernoulliNB num_of_words=[] scores_BernoulliNB=[] vect=CountVectorizer(stop_words='english') vect.fit(text_train) num_of_words.append(len(vect.get_feature_names())) X_train=vect.transform(text_train) X_test=vect.transform(text_test) model=BernoulliNB() model.fit(X_train,y_train) scores_Ber..

[AI]/python.sklearn 2020. 4. 28. 10:33

sklearn.feature_extraction.text.CountVectorizer.max_df변화 관찰

불용어 적용(stop words) 관사 지시 대명사 등등 관용적으로 사용하는 단어들 where when the it etc... max_df(너무 많이 나오는 애들)(비율) stop_words : 불용어 목록을 지정함 from sklearn.feature_extraction.text import CountVectorizer from sklearn.naive_bayes import BernoulliNB num_of_words=[] scores_BernoulliNB=[] max_df=np.arange(0.1,1,0.1) for df in max_df: vect=CountVectorizer(max_df=df) vect.fit(text_train) num_of_words.append(len(vect.get_fe..

[AI]/python.sklearn 2020. 4. 28. 10:25

sklearn.feature_extraction.text.CountVectorizer.min_df변화 관찰

(속성(단어) 줄이기) 단어집에서 min_df 이하의 횟수 만큼 나온 단어들을 제거 from sklearn.feature_extraction.text import CountVectorizer from sklearn.naive_bayes import BernoulliNB num_of_words=[] scores_BernoulliNB=[] min_df=range(1,10) for df in min_df: vect=CountVectorizer(min_df=df) vect.fit(text_train) num_of_words.append(len(vect.get_feature_names())) X_train=vect.transform(text_train) X_test=vect.transform(text_test) ..

[AI]/python.sklearn 2020. 4. 28. 09:57

Prev 1 2 3 4 ··· 10 Next

목록[AI]/python.sklearn (95)

bro's coding

티스토리툴바