英文:

grid_pipeline.fit uses default value of solver parameter instead of GridSearchCV value

问题 {#heading}

我尝试找到sklearn中LogisticRegression的最佳超参数组合。以下是我的代码示例：

pipeline = Pipeline([("scaler", StandardScaler()),
                     ("smt",    SMOTE(random_state=42)),
                     ("logreg", LogisticRegression())])
parameters = [{'logreg__solver': ['saga']},
{'logreg__penalty':['l1', 'l2']},
{'logreg__C':[1e-3, 0.1, 1, 10, 100]}]
grid_pipeline = GridSearchCV(pipeline,
parameters,
scoring= 'f1',
n_jobs=5, verbose=5,
return_train_score=True,
cv=5)
grid_result = grid_pipeline.fit(X_train, y_train)

在拟合过程中，我收到以下错误信息：

ValueError: Solver lbfgs supports only 'l2' or 'none' penalties, got l1 penalty.

出现了一个问题，solver参数默认使用了'lbfgs'而不是选择的'saga'。为什么会发生这种情况？英文:

I tried to find the best combination of hyperparameters for LogisticRegression in sklearn. Below is the example of my code:

pipeline = Pipeline([(&quot;scaler&quot;, StandardScaler()),
                     (&quot;smt&quot;,    SMOTE(random_state=42)),
                     (&quot;logreg&quot;, LogisticRegression())])
parameters = [{&#39;logreg__solver&#39;: [&#39;saga&#39;]},
{&#39;logreg__penalty&#39;:[&#39;l1&#39;, &#39;l2&#39;]},
{&#39;logreg__C&#39;:[1e-3, 0.1, 1, 10, 100]}]
grid_pipeline = GridSearchCV(pipeline,
parameters,
scoring= &#39;f1&#39;,
n_jobs=5, verbose=5,
return_train_score=True,
cv=5)
grid_result = grid_pipeline.fit(X_train,y_train)

During fitting I get the following error:

ValueError: Solver lbfgs supports only &#39;l2&#39; or &#39;none&#39; penalties, got l1 penalty.

For some reason, default value 'lbfgs' is used for solver parameter instead of chosen 'saga'. Why does it happen?

答案1 {#1}

得分: 1

我认为问题出在您如何指定parameters上。为了获得所需的行为，请使用单个dict，如下所示：

parameters = {'logreg__solver': ['saga'],
              'logreg__penalty':['l1', 'l2'],
              'logreg__C':[1e-3, 0.1, 1, 10, 100]
              }

您之前将其指定为字典列表，这使GridSearchCV 有选择地挑选一些参数并忽略其他参数，这意味着它有时会要求在默认（非saga）求解器上使用l1。这两个选项不兼容。英文:

I think the issue is how you have specified parameters. To get the desired behaviour, use a single dict as follows:

parameters = {&#39;logreg__solver&#39;: [&#39;saga&#39;],
              &#39;logreg__penalty&#39;:[&#39;l1&#39;, &#39;l2&#39;],
              &#39;logreg__C&#39;:[1e-3, 0.1, 1, 10, 100]
              }

You had specified it as a list of dicts, which gave GridSearchCV the option of picking some and ignoring others, meaning it sometimes encountered the request to use l1 on the default (non-saga) solver. Those two options are not compatible.

答案2 {#2}

得分: 0

为什么你将参数传递为一个字典列表，而不是一个列表字典？

难道不是

parameters = {'solver': ['saga'],
              'penalty':['l1', 'l2'],
              'C':[0.001, 0.01, 0.1, 1, 10, 100]}

这是你想要的吗？

在这里可行。英文:

Why are you passing your parameters as a list of dictionaries, instead of a dictionary of lists?

Isn't

parameters = {&#39;solver&#39;: [&#39;saga&#39;],
              &#39;penalty&#39;:[ &#39;l1&#39;, &#39;l2&#39;],
              &#39;C&#39;:[0.001, 0.01, 0.1, 1, 10, 100]}

what you want?

Works here.