- scikit-learn에서 data generation에 용이함.
Sample Code - Generating
import sklearn.datasets as dt data = dt.make_classification( n_samples = 100, n_features = 2, n_repeated = 0, n_classes = 2, n_redundant = 0 ) X, y = data[0], data[1] y = y.astype(int)
Sample Code - Shuffling
shuffle_idx = np.arange(y.shape[0]) shuffle_rng = np.random.RandomState(123) shuffle_rng.shuffle(shuffle_idx) X, y = X[shuffle_idx], y[shuffle_idx] X_train, X_test = X[shuffle_idx[:70]], X[shuffle_idx[70:]] y_train, y_test = y[shuffle_idx[:70]], y[shuffle_idx[70:]]
Sample Code - Normalizing
mu, sigma = X_train.mean(axis=0), X_train.std() X_train = (X_train - mu) / sigma X_test = (X_test - mu) / sigma
아래 코드로 visualizing하면 다음과 같음.
plot
plt.scatter(X_train[y_train==0, 0], X_train[y_train==0, 1], label='class 0', marker='o') plt.scatter(X_train[y_train==1, 0], X_train[y_train==1, 1], label='class 1', marker='s') plt.title('Training set') plt.xlabel('feature 1') plt.ylabel('feature 2') plt.xlim([-3, 3]) plt.ylim([-3, 3]) plt.legend() plt.show()
