我有一个模型,我想在其中分析残差。最后,我想确定每天超出置信区间的极端结果。但我很难计算装袋回归器中每个模型的残差的逐点标准差。
我的示例代码如下:;
import pandas as pd import numpy as np from sklearn.model_selection import train_test_split from sklearn.svm import SVR from sklearn.ensemble import BaggingRegressor # Sample DataFrame df = pd.DataFrame(np.random.randint(0,200,size=(500, 4)), columns=list('ABCD')) # Add dates to sample data base = datetime.datetime.today() date_list = [base - datetime.timedelta(days=x) for x in range(500)] df['date'] = date_list df['date'] = df['date'].astype('str') # Split dataset into testing and training train = df[:int(len(df)*0.80)] test = df[int(len(df)*0.20):] X_train = train[['B','C','D','date']] X_test = test[['B','C','D','date']] y_train = train[['A']] y_test = test[['A']] # Function to Encode the data def encode_and_bind(data_in, feature_to_encode): dummies = pd.get_dummies(data_in[[feature_to_encode]]) data_out = pd.concat([data_in, dummies], axis=1) data_out = data_out.drop([feature_to_encode], axis=1) return(data_out) for feature in features_to_encode: X_train_final = encode_and_bind(X_train, 'date') X_test_final = encode_and_bind(X_test, 'date') # Define Model svr_lin = SVR(kernel="linear", C=100, gamma="auto") regr = BaggingRegressor(base_estimator=svr_lin,random_state=5).fit(X_train_final, y_train.values.ravel()) # Predictions y_pred = regr.predict(X_test_final) # Join the predictions back into orignial dataframe y_test['predict'] = y_pred # Calculate residuals y_test['residuals'] = y_test['A'] - y_test['predict']
我在网上找到了这个方法
raw_pred = [x.predict([[0, 0, 0, 0]]) for x in regr.estimators_]
但我不确定x.predict([[0, 0, 0, 0]])
部分使用什么,因为我有4个以上的功能。