tensorflow moving average

Posted by neverset on April 6, 2020

google says that use averaged parameters sometimes produce significantly better results than the final trained values.

tf.train.ExponentialMovingAverage(decay, num_updates=None, name=’ExponentialMovingAverage’)

  • decay: decay rate, usually very close to 1 (0.99)
  • num_updates: if num_updates is not set, decay will be fixed; if num_updates=global_step, then decay=min(decay, (1 + num_updates) / (10 + num_updates))

  • new_value=(1−decay)×value+decay×old_value

Implementation

    global_step=tf.Variable(0, trainable=False) #define global step
    variable_average=tf.train.ExponentialMovingAverage(decay=0.9999, num_updates=global_step) #create decay instance
    variable_averages_op=variable_averages.apply(tf.trainable_variables()) # apply decay on all trainable parameters
    train_step=tf.train.GradientDescentOptimizer(learning_rate).minize(loss, global_step=global_step)# this will automatically update global_step by adding one in each optimatization step
     with tf.control_dependencies([train_step]):
            train_op=tf.group(variable_averages_op) # put parameter updating operation in training operation

restore from saved modell

    variables_to_restore = variable_average.variables_to_restore()
    saver = tf.train.Saver(variables_to_restore)
    saver.restore(sess, save_path)