Nerual Style

There are two aspect for a image, one is the content of the image, which can be descriped as elements or object in the image, another is the style of the image, it might be abstract, and usually revealed by the painting skill or technique.

Shortly, we have two image, one for style while the other for content. now, we want to combine the style in image1 and the content in image2 together, and it can be achieved from deep neural net work, and we call it Neural Style.

Moreover we simply define the loss function care both style and content

Ltotal=αLcontent+βLstyle

Now let’s have a look about what neural network can do here, and analysis the affect of Lcontent and Lstyle independently.

Suppose we have the content image, and send it to the neural network, it will have the responses in each layer by filters, we also construct a white noisy image, filter it in the same way, and define a loss Lcontent between filtered content and filtered noisy, we take the noisy image as input,and it can update iterativly.

reconstruction

The image above show the reconstruction result between different layers, and reconstruction from lower layers(a,b,c) is alomost perfect, the style reconstruction may be more realistic in the deeper layer.

Let’s get familiar with some notion of the formulation first( suppse we are in the lth level of the net ):

  • p: Original content image (input)
    • Pl: Content feature representation in layer l respect to p
  • a: Original style image (input)
    • Al: Style feature representation in layer l respect to a
  • x: Target image (output)

    • Fl: Content feature representation in layer l respect to x
    • Gl: Style feature representation in layer l respect to x
    • Fijl: Element of ith filter at jth position in layer l
  • Nl: The number of the filters in the lth level

  • Ml: The size of a feature map produced by a filter,usually it equals to height×weight

The squared-error loss between two content feature representations is:

Lcontent(p,x,l)=12i,j(FijlPijl)2

In each layer, build a style representation compute the correlations between the different filter responses, which is called Gram Matrix GlRNl×Nl, and Gijl is the inner product between the vectorized feature map between i and j in layer l

Gijl=kFiklFjkl

Also we have

Aijl=kPiklPjkl

The contribution of the layer to the total loss is

El=14Nl2Ml2i,j(GijlAijl)2

And the total loss is

Lstyle(a,x)=l=0LwlEl

Let’s focus more on the detail about the gradient of the loss:

The derivative of content loss respect to activations in layer l equals

LcontentFijl={(FlPl)ijif Fijl>00if Fijl<0

The derivative of style loss respect to activations in layer l equals

ElFijl={1Nl2Ml2((Fl)T(GlAl))ijif Fijl>00if Fijl<0

The final loss function we want to minimize is

Ltotal(p,a,x)=αLcontent(p,x)+βLstyle(a,x)

Fast Neural Style

FastNet
FastNet