PyTorch 模型保存和加载-工具盒子

神经网络的训练有时需要几天、几周、甚至几个月，为了在每次使用模型时避免高代价的重复训练，我们就需要将模型序列化到磁盘中，使用的时候反序列化到内存中。

PyTorch 提供了两种保存模型的方法：

直接序列化模型对象
存储模型的网络参数
直接序列化模型对象 {#title-0} =======================

import torch
import torch.nn as nn
import pickle


class Model(nn.Module):

    def __init__(self, input_size, output_size):
        
        super(Model, self).__init__()
        self.linear1 = nn.Linear(input_size, input_size * 2)
        self.linear2 = nn.Linear(input_size * 2, output_size)

    def forward(self, inputs):

        inputs = self.linear1(inputs)
        output = self.linear2(inputs)
        return output


def test01():

    model = Model(128, 10)

    # 第一个参数: 存储的模型
    # 第二个参数: 存储的路径
    # 第三个参数: 使用的模块
    # 第四个参数: 存储的协议
    torch.save(model, 'model/test_model_save.bin', pickle_module=pickle, pickle_protocol=2)


def test02():

    # 第一个参数: 加载的路径
    # 第二个参数: 模型加载的设备
    # 第三个参数: 加载的模块
    model = torch.load('model/test_model_save.bin', map_location='cpu', pickle_module=pickle)


if __name__ == '__main__':
    test01()
    test02()

Python 的 Pickle 序列化协议有多种，详细可查看官网: https://www.python.org/search/?q=pickle+protocol

当我们训练的模型在 GPU 中时，torch.save 函数将其存储到磁盘中。当再次加载该模型时，会将该模型从磁盘先加载到 CPU 中，再移动到指定的 GPU 中，例如： cuda:0、cuda:1。但是，当重新加载的机器不存在 GPU 时，模型加载可能会出错，这时，可通过 map_localtion='CPU' 将其加载到 CPU 中。

存储模型的网络参数 {#title-1} =======================

import torch
import torch.nn as nn
import torch.optim as optim


class Model(nn.Module):

    def __init__(self, input_size, output_size):
        
        super(Model, self).__init__()
        self.linear1 = nn.Linear(input_size, input_size * 2)
        self.linear2 = nn.Linear(input_size * 2, output_size)

    def forward(self, inputs):

        inputs = self.linear1(inputs)
        output = self.linear2(inputs)
        return output



def test01():

    model = Model(128, 10)
    optimizer = optim.Adam(model.parameters(), lr=1e-3)

    # 定义存储参数
    save_params = {
        'init_params': {
            'input_size': 128,
            'output_size': 10
        },
        'acc_score': 0.98,
        'avg_loss': 0.86,
        'iter_numbers': 100,
        'optim_params': optimizer.state_dict(),
        'model_params': model.state_dict()
    }

    # 存储模型参数
    torch.save(save_params, 'model/model_params.bin')


def test02():

    # 加载模型参数
    model_params = torch.load('model/model_params.bin')
    # 初始化模型
    model = Model(model_params['init_params']['input_size'], model_params['init_params']['output_size'])
    # 初始化优化器
    optimizer = optim.Adam(model.parameters())
    optimizer.load_state_dict(model_params['optim_params'])
    # 显示其他参数
    print('迭代次数:', model_params['iter_numbers'])
    print('准确率:', model_params['acc_score'])
    print('平均损失:', model_params['avg_loss'])


if __name__ == '__main__':
    test01()
    test02()

在上面代码中，我们把模型的一些初始化参数、模型的权重参数、训练的迭代次数、以及优化器的参数等都进行了存储。

直接存储模型对象依赖于 PyTorch 的实现，而存储模型参数与 PyTorch 的实现关系较弱，建议使用第二种方法来存储模型。