• scikit-learn에서 data generation에 용이함.

Sample Code - Generating

import sklearn.datasets as dt
 
data = dt.make_classification(
	n_samples = 100,
	n_features = 2, 
	n_repeated = 0,
	n_classes = 2,
	n_redundant = 0
)
 
X, y = data[0], data[1]
y = y.astype(int)
 

Sample Code - Shuffling

shuffle_idx = np.arange(y.shape[0])
shuffle_rng = np.random.RandomState(123)
shuffle_rng.shuffle(shuffle_idx)
X, y = X[shuffle_idx], y[shuffle_idx]
 
X_train, X_test = X[shuffle_idx[:70]], X[shuffle_idx[70:]]
y_train, y_test = y[shuffle_idx[:70]], y[shuffle_idx[70:]]

Sample Code - Normalizing

mu, sigma = X_train.mean(axis=0), X_train.std()
X_train = (X_train - mu) / sigma
X_test = (X_test - mu) / sigma

아래 코드로 visualizing하면 다음과 같음.

plot

plt.scatter(X_train[y_train==0, 0], X_train[y_train==0, 1], label='class 0', marker='o')
plt.scatter(X_train[y_train==1, 0], X_train[y_train==1, 1], label='class 1', marker='s')
plt.title('Training set')
plt.xlabel('feature 1')
plt.ylabel('feature 2')
plt.xlim([-3, 3])
plt.ylim([-3, 3])
plt.legend()
plt.show()