Skip to content

Latest commit

 

History

History
350 lines (230 loc) · 9.91 KB

1-4,时间序列数据建模流程范例.md

File metadata and controls

350 lines (230 loc) · 9.91 KB

1-4, Example of time series data modeling process

The disaster of the new crown pneumonia epidemic in 2020 has affected the lives of people all over the world in many ways.

Some students are income, some are emotional, some are psychological, and some are weight.

Based on China's epidemic data before March 2020, this paper establishes a time series RNN model to predict the end time of China's new crown pneumonia epidemic.

import os
import datetime
import importlib
import torchkeras

#Print Time
def printbar():
    nowtime = datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')
    print("\n"+"=========="*8 + "%s"%nowtime)

#Mac system pytorch and matplotlib running at the same time in jupyter need to change environment variables
os.environ["KMP_DUPLICATE_LIB_OK"]="TRUE"

One, prepare data

The data set in this article is taken from tushare, and the method of obtaining this data set refers to the following article.

"Https://zhuanlan.zhihu.com/p/109556102"

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
%config InlineBackend.figure_format ='svg'

df = pd.read_csv("./data/covid-19.csv",sep = "\t")
df.plot(x = "date",y = ["confirmed_num","cured_num","dead_num"],figsize=(10,6))
plt.xticks(rotation=60);

dfdata = df.set_index("date")
dfdiff = dfdata.diff(periods=1).dropna()
dfdiff = dfdiff.reset_index("date")

dfdiff.plot(x = "date",y = ["confirmed_num","cured_num","dead_num"],figsize=(10,6))
plt.xticks(rotation=60)
dfdiff = dfdiff.drop("date",axis = 1).astype("float32")

dfdiff.head()

Below we implement a custom time series data set by inheriting torch.utils.data.Dataset.

torch.utils.data.Dataset is an abstract class. Users who want to load custom data only need to inherit this class and override two of the methods:

  • __len__: Implement len(dataset) to return the size of the entire data set.
  • __getitem__: Used to get some indexed data, so that dataset[i] returns the i-th sample in the dataset.

Failure to override these two methods will directly return an error.

import torch
from torch import nn
from torch.utils.data import Dataset,DataLoader,TensorDataset


#Use the window data of 8 days before a certain day as input to predict the data of that day
WINDOW_SIZE = 8

class Covid19Dataset(Dataset):
        
    def __len__(self):
        return len(dfdiff)-WINDOW_SIZE
    
    def __getitem__(self,i):
        x = dfdiff.loc[i:i+WINDOW_SIZE-1,:]
        feature = torch.tensor(x.values)
        y = dfdiff.loc[i+WINDOW_SIZE,:]
        label = torch.tensor(y.values)
        return (feature,label)
    
ds_train = Covid19Dataset()

#Data is small, all training data can be put into one batch to improve performance
dl_train = DataLoader(ds_train,batch_size = 38)

Second, define the model

There are usually three ways to build a model using Pytorch: use nn.Sequential to build a model in layer order, inherit nn.Module base class to build a custom model, inherit nn.Module base class to build a model and assist in application model container packaging.

Choose the second way to build the model here.

Since the next training loop in the form of a class is used, we further encapsulate the model into the Model class in torchkeras to obtain a function similar to the high-level model interface in Keras.

The Model class actually inherits from the nn.Module class.

import torch
from torch import nn
import importlib
import torchkeras

torch.random.seed()

class Block(nn.Module):
    def __init__(self):
        super(Block,self).__init__()
    
    def forward(self,x,x_input):
        x_out = torch.max((1+x)*x_input[:,-1,:],torch.tensor(0.0))
        return x_out
    
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        # 3 layer lstm
        self.lstm = nn.LSTM(input_size = 3, hidden_size = 3, num_layers = 5, batch_first = True)
        self.linear = nn.Linear(3,3)
        self.block = Block()
        
    def forward(self,x_input):
        x = self.lstm(x_input)[0][:,-1,:]
        x = self.linear(x)
        y = self.block(x,x_input)
        return y
        
net = Net()
model = torchkeras.Model(net)
print(model)

model.summary(input_shape=(8,3),input_dtype = torch.FloatTensor)
    
Net(
  (lstm): LSTM(3, 3, num_layers=5, batch_first=True)
  (linear): Linear(in_features=3, out_features=3, bias=True)
  (block): Block()
)
-------------------------------------------------- --------------
        Layer (type) Output Shape Param #
================================================= ==============
              LSTM-1 [-1, 8, 3] 480
            Linear-2 [-1, 3] 12
             Block-3 [-1, 3] 0
================================================= ==============
Total params: 492
Trainable params: 492
Non-trainable params: 0
-------------------------------------------------- --------------
Input size (MB): 0.000092
Forward/backward pass size (MB): 0.000229
Params size (MB): 0.001877
Estimated Total Size (MB): 0.002197
-------------------------------------------------- --------------

Three, training model

Training Pytorch usually requires users to write custom training loops. The code style of training loops varies from person to person.

There are three typical training loop code styles: script form training loop, function form training loop, and class form training loop.

Here is a kind of training loop.

We modeled on Keras to define a high-level model interface Model, which implements the fit, validate, predict, and summary methods, which is equivalent to a user-defined high-level API.

Note: It is more difficult to debug the cyclic neural network. It is necessary to set multiple different learning rates and try multiple times to achieve better results.

def mspe(y_pred,y_true):
    err_percent = (y_true-y_pred)**2/(torch.max(y_true**2,torch.tensor(1e-7)))
    return torch.mean(err_percent)

model.compile(loss_func = mspe,optimizer = torch.optim.Adagrad(model.parameters(),lr = 0.1))
dfhistory = model.fit(100,dl_train,log_step_freq=10)

Four, evaluation model

The evaluation model generally needs to set a validation set or a test set. Since this example has less data, we only visualize the iterative situation of the loss function on the training set.

%matplotlib inline
%config InlineBackend.figure_format ='svg'

import matplotlib.pyplot as plt

def plot_metric(dfhistory, metric):
    train_metrics = dfhistory[metric]
    epochs = range(1, len(train_metrics) + 1)
    plt.plot(epochs, train_metrics,'bo--')
    plt.title('Training'+ metric)
    plt.xlabel("Epochs")
    plt.ylabel(metric)
    plt.legend(["train_"+metric])
    plt.show()
plot_metric(dfhistory,"loss")

Five, use model

Here we use the model to predict the end time of the epidemic, that is, the time when the newly confirmed case is zero.

#Use dfresult to record existing data and predicted epidemic data thereafter
dfresult = dfdiff[["confirmed_num","cured_num","dead_num"]].copy()
dfresult.tail()

#Predict the new trend in the next 200 days and add the result to dfresult
for i in range(200):
    arr_input = torch.unsqueeze(torch.from_numpy(dfresult.values[-38:,:]),axis=0)
    arr_predict = model.forward(arr_input)

    dfpredict = pd.DataFrame(torch.floor(arr_predict).data.numpy(),
                columns = dfresult.columns)
    dfresult = dfresult.append(dfpredict,ignore_index=True)
dfresult.query("confirmed_num==0").head()

# The number of newly diagnosed diagnoses dropped to 0 from the 50th day, and the 45th day corresponds to March 10, which is 5 days later, that is, the newly diagnosed diagnoses are expected to drop to 0 on March 15
# Note: The forecast is optimistic

dfresult.query("cured_num==0").head()

# On the 132nd day, the number of new cures decreased to 0, and the 45th day corresponds to March 10, which is about 3 months later, that is, all cured around June 10.
# Note: The forecast is pessimistic and there are problems. If we add up the number of newly cured patients each day, it will exceed the cumulative number of confirmed diagnoses.

dfresult.query("dead_num==0").head()

# The number of newly diagnosed diagnoses dropped to 0 from the 50th day, and the 45th day corresponds to March 10, which is 5 days later, that is, the newly diagnosed diagnoses are expected to drop to 0 on March 15
# Note: The forecast is optimistic

Six, save the model

It is recommended to save the Pytorch model by saving parameters.

print(model.net.state_dict().keys())
# Save model parameters

torch.save(model.net.state_dict(), "./data/model_parameter.pkl")

net_clone = Net()
net_clone.load_state_dict(torch.load("./data/model_parameter.pkl"))
model_clone = torchkeras.Model(net_clone)
model_clone.compile(loss_func = mspe)

# Evaluation model
model_clone.evaluate(dl_train)
{'val_loss': 4.254558563232422}

If this book is helpful to you and want to encourage the author, remember to add a star⭐️ to this project and share it with your friends 😊!

If you need to further communicate with the author on the understanding of the content of this book, please leave a message under the public account "Algorithm Food House". The author has limited time and energy and will respond as appropriate.

You can also reply to keywords in the background of the official account: Add group, join the reader exchange group and discuss with you.

算法美食屋logo.png