fully connected layer+CNN has some drawbacks compared with fully CNN even if they have the same number of parameters and time/space cost
- the former modell can only classify object for one whole picture, the later can have multi-classification if possible
- the converlutional layer computation do not depend on input matrix size.
- With sliding windows operation of CNN the location of the object is obtained(Segmentation)
conversion
X ist 2x2 feature map, it is first converted into a 4-dimension array, then do fully connected computation, get classification results Y
it is just the same result if we do a converlutional computation with 4 2x2 kernel on the feature map X
implementation
the converlution operatioin in tensorflow is done with tf.nn.conv2d(), the input matrix and kernel matrix are all 4d matrix (need reshape if not 4d)
input matrix [1,5,5,4] means batch_size=1, shape is 4 channel 5x5 matrix kernel matrix [3,3,4,3] means 4 input channel 3 output channel 3x3 matrix