| 일 | 월 | 화 | 수 | 목 | 금 | 토 |
|---|---|---|---|---|---|---|
| 1 | ||||||
| 2 | 3 | 4 | 5 | 6 | 7 | 8 |
| 9 | 10 | 11 | 12 | 13 | 14 | 15 |
| 16 | 17 | 18 | 19 | 20 | 21 | 22 |
| 23 | 24 | 25 | 26 | 27 | 28 | 29 |
| 30 |
- 재귀함수
- CES 2O21 참가
- inorder
- bccard
- java역사
- 자료구조
- CES 2O21 참여
- Keras
- KNeighborsClassifier
- web 개발
- mglearn
- 데이터전문기관
- web 용어
- 결합전문기관
- tensorflow
- discrete_scatter
- classification
- web 사진
- paragraph
- web
- cudnn
- C언어
- 머신러닝
- 웹 용어
- postorder
- vscode
- 대이터
- pycharm
- broscoding
- html
- Today
- Total
목록전체 글 (690)
bro's coding
# Stop_words 적용 from sklearn.feature_extraction.text import CountVectorizer from sklearn.naive_bayes import BernoulliNB num_of_words=[] scores_BernoulliNB=[] vect=CountVectorizer(stop_words='english') vect.fit(text_train) num_of_words.append(len(vect.get_feature_names())) X_train=vect.transform(text_train) X_test=vect.transform(text_test) model=BernoulliNB() model.fit(X_train,y_train) scores_Ber..
불용어 적용(stop words) 관사 지시 대명사 등등 관용적으로 사용하는 단어들 where when the it etc... max_df(너무 많이 나오는 애들)(비율) stop_words : 불용어 목록을 지정함 from sklearn.feature_extraction.text import CountVectorizer from sklearn.naive_bayes import BernoulliNB num_of_words=[] scores_BernoulliNB=[] max_df=np.arange(0.1,1,0.1) for df in max_df: vect=CountVectorizer(max_df=df) vect.fit(text_train) num_of_words.append(len(vect.get_fe..
(속성(단어) 줄이기) 단어집에서 min_df 이하의 횟수 만큼 나온 단어들을 제거 from sklearn.feature_extraction.text import CountVectorizer from sklearn.naive_bayes import BernoulliNB num_of_words=[] scores_BernoulliNB=[] min_df=range(1,10) for df in min_df: vect=CountVectorizer(min_df=df) vect.fit(text_train) num_of_words.append(len(vect.get_feature_names())) X_train=vect.transform(text_train) X_test=vect.transform(text_test) ..
import numpy as np # upload data file imdb_tarin,imdb_test=np.load('imdb.npy') # decode -> remove text_train=[s.decode().replace(' ','') for s in imdb_tarin.data] text_test=[s.decode().replace(' ','')for s in imdb_test.data] y_train=imdb_tarin.target y_test=imdb_test.target from sklearn.feature_extraction.text import CountVectorizer vect=CountVectorizer() # train(train data) vect.fit(text_train,..
https://broscoding.tistory.com/203 sklearn.feature_extraction.text.CountVectorizer BOW(Bag of words) : 단어집 만들기 from sklearn.feature_extraction.text import CountVectorizer ss=['I am Tom. Tom is me!','He is Tom. He is a man'] vect=CountVectorizer() vect.fit(ss) ''' CountVector.. broscoding.tistory.com import numpy as np # upload data file imdb_tarin,imdb_test=np.load('imdb.npy') # decode -> remove..
# 단어집과 문장 대조하기 # for i in range(X_train[0].shape[1]): # if X_train[0,i]>0: # print(i,vect.get_feature_names()[i],X_train[0,i]) # get_feature_names을 반복문 안에 넣어놓으면 오래걸린다(따로 변수 선언) feature_name=vect.get_feature_names() for i in range(X_train[0].shape[1]): if X_train[0,i]>0: print(i,feature_name[i],X_train[0,i]) ''' 1723 actions 1 1741 actors 1 2880 almost 1 3375 and 2 3859 anything 1 4269 are 1 6512..
BOW(Bag of words) : 단어집 만들기 from sklearn.feature_extraction.text import CountVectorizer ss=['I am Tom. Tom is me!','He is Tom. He is a man'] vect=CountVectorizer() vect.fit(ss) ''' CountVectorizer(analyzer='word', binary=False, decode_error='strict', dtype=, encoding='utf-8', input='content', lowercase=True, max_df=1.0, max_features=None, min_df=1, ngram_range=(1, 1), preprocessor=None, stop_wor..
imdb_tarin.data[6] b"This movie has a special way of telling the story, at first i found it rather odd as it jumped through time and I had no idea whats happening. Anyway the story line was although simple, but still very real and touching. You met someone the first time, you fell in love completely, but broke up at last and promoted a deadly agony. Who hasn't go through this? but we will never ..
import numpy as np import pandas as pd import matplotlib.pyplot as plt from sklearn.datasets import load_files # 파일 읽기 imdb_tarin=load_files('data/aclImdb/train') imdb_test=load_files('data/aclImdb/test') # numpy로 저장(이후에 빨리 읽기 위해) np.save('imdb.npy',[imdb_tarin,imdb_test]) # numpy 파일 읽기 imdb_tarin,imdb_test=np.load('imdb.npy')