Gradient Descent
Stochastic Gradient Descent
Adaptive Learning Rate Method
non-gradient Descent
Particle Swarm Optimization
a population-based method that defines a set of ‘particles’ that explore the search space, attempting to find a minimum.
- does not require the optimization problem to be differentiable
- makes little to no assumptions about the problem being optimized and can search very large spaces
Surrogate optimization
a method of optimization that attempts to model the loss function with another well-established function to find the minima.
- a non-iterative method
- can handle large data and non-differentiable optimization problems
- almost always faster than gradient descent methods, but often at the cost of accuracy
Simulated Annealing
The temperature is set at some initial positive value and progressively approaches zero.
At each time step, the algorithm randomly chooses a solution close to the current one, measures its quality, and moves to it depending on the current temperature (probability of accepting better or worse solutions).
Ideally, by the time the temperature reaches zero, the algorithm has converged on a global minima solution.
- performs especially well in scenarios where an approximate solution is required in a short period of time,
tips
single objective optimization
you can find all single objective optimization function in python here: https://github.com/AxelThevenot/Python_Benchmark_Test_Optimization_Function_Single_Objective