Gradient Descent Optimization With AdaMax From Scratch