Note on criterion and optimizer

Sarit Ritwirune
2 min readAug 24, 2024


I am experimenting the LSTM to learn my stock market sequence and this NN and pair of criterion and optimizer does not learn.

# LSTM model
class SineLSTM(nn.Module):
def init(self, input_size, hidden_size, num_layers, output_size):
super(SineLSTM, self).init()
self.hidden_size = hidden_size
self.num_layers = num_layers
self.lstm = nn.LSTM(input_size, hidden_size, num_layers, batch_first=True)
self.fc = nn.Linear(hidden_size, output_size)
def forward(self, x):
batch_size = x.size(0)
h0 = torch.zeros(self.num_layers, batch_size, self.hidden_size).to(x.device)
c0 = torch.zeros(self.num_layers, batch_size, self.hiddensize).to(x.device)
out, = self.lstm(x, (h0, c0))
out = self.fc(out[:, -1, :])
return out
# Initialize model, loss function, and optimizer
model = SineLSTM(input_size, hidden_size, num_layers, output_size)
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

Claude give me answer on the combination of them

Here are the common combinations of loss functions (criterion) and optimizers that could be used with the LSTM model you provided:

1. MSE Loss with Adam:
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

2. MSE Loss with SGD:
criterion = nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

3. L1 Loss (Mean Absolute Error) with Adam:
criterion = nn.L1Loss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

4. L1 Loss with SGD:
criterion = nn.L1Loss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

5. Huber Loss with Adam:
criterion = nn.HuberLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

6. Huber Loss with SGD:
criterion = nn.HuberLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

7. MSE Loss with RMSprop:
criterion = nn.MSELoss()
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)

8. L1 Loss with RMSprop:
criterion = nn.L1Loss()
optimizer = torch.optim.RMSprop(model.parameters(), lr=learning_rate)

These are some of the most common combinations. The choice depends on your specific problem and data characteristics. Would you like me to explain any of these combinations in more detail?



Sarit Ritwirune

On the way to full stack cross-platform. Currently make living by data science.