machine-learning Softmax

softmax function

softmax is used to to transform multi-output values in (0, 1) zone, so that a proper selection according to max. probability can be made

the mathmatical expression is shown:

the loss is expressed with cross-entropy function:

y: true value
a: predicted value with softmax

if j not equal i: aj
if j equal i: ai-1

[i is the final selected one;
j is the item to derivate partially]

logits: the non-normalized probability y=tf.matmul(x,w)+b

tf.nn.softmax(logits, name=None)

y_hat=… #predicted label, e.g. y=tf.matmul(x,w)+b y_true=.. #label, one_hot encoded y_hat_softmax=tf.nn.softmax(y_hat) total_loss=tf.reduce_mean(-tf.reduce_sum(y_true*tf.log(y_hat_softmax), [1]))
tf.nn.softmax_cross_entropy_with_logits(logits, labels, name=None)

label is one-hot coded
this only backwards update the logits total_loss=tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y_hat, y_true))

difference to sigmoid_cross_entropy_with_logits is that sigmoid_cross_entropy_with_logits can calculate cross entropy of independet but not mutually exclusive (multi-class) targets; one class target classification sis not feasable for sigmoid_cross_entropy_with_logits; the input logits is not processed with sigmoid
tf.nn.softmax_cross_entropy_with_logits_v2(logits, labels, name=None)

label is on-hot coded this will update logits and labels in backwards propagation, e.g. GAN. the propagation to labels can be deactivated by graph.stop_gradients(labels)
tf.nn.sparse_softmax_cross_entropy_with_logits(logits, labels, name=None)

labels is the label index, not one-hot coded array
so
labels_not_hot_coded=tf.argmax(labels_hot_coded, 1) => tf.nn.sparse_softmax_cross_entropy_with_logits(logits, labels_not_hot_coded)=tf.nn.sparse_softmax_cross_entropy_with_logits(logits, tf.argmax(labels_hot_coded, 1))