我有一个数据集,并使用pd.get_dummies对目标列(整个列中有5个不同的字符串)进行了热编码。然后我使用sklearn的train_test_split函数来创建训练、测试和验证集。然后使用standardScalar()规范化训练集(特性)。我已经将特征和目标的训练集拟合到逻辑回归模型中。
我现在正在尝试计算训练、验证和测试集的准确度分数,但我没有运气。此部分的代码如下:
dataset = pd.read_csv('tabular_data/clean_tabular_data.csv')
features, label = load_airbnb(dataset, 'Category')
label_series = dataset['Category']
label_encoded = pd.get_dummies(label_series)
print(label_encoded.shape)
print(label_encoded)
X_train, X_test, y_train, y_test = train_test_split(features, label_encoded, test_size=0.3)
X_test, X_validation, y_test, y_validation = train_test_split(X_test, y_test, test_size=0.5)
# normalize the features
scaler = StandardScaler()
scaler.fit(X_train)
X_train_scaled = scaler.transform(X_train)
X_validation_scaled = scaler.transform(X_validation)
X_test_scaled = scaler.transform(X_test)
# get baseline classification model
model = LogisticRegression()
print(y_train)
print(X_train_scaled.shape)
y_train = y_train.iloc[:, 0]
print(y_train.shape)
model.fit(X_train_scaled, y_train)
y_train_pred = model.predict(X_train_scaled)
y_train_pred = np.argmax(y_train_pred, axis=0)
y_validation_pred = model.predict(X_validation_scaled)
y_validation_pred = np.argmax(y_validation_pred, axis =0)
y_test_pred = model.predict(X_test_scaled)
y_test_pred = np.argmax(y_test_pred, axis = 0)
# evaluate model using accuracy
train_acc = accuracy_score(y_train, y_train_pred)
test_acc = accuracy_score(y_test, y_test_pred)
validation_acc = accuracy_score(y_validation, y_validation_pred)
我收到的错误如下:“文件”C:\Users\lcox1\Documents\VSCode\AiCore\Data science\classification_prac.py“,第56行,in train_acc=accuracy_score(y_train,y_train_pred)
TypeError:不能将Singleton数组16视为有效集合。"
我对python相当陌生,所以不知道问题是什么。任何帮助都很感谢。