A deep-learning model is a directed, acyclic graph of layers.

## Layers

Layer is a data-processing component:

- takes one or more tensors as input and outputs one or more tensors.

Stateless or stateful layers:

- Some layers are stateless, but more frequently layers have a state.
- Stateful layers contain parameters that are learned from the data

Types of layers:

- Different layers are appropriate for different tensor formats and different types of data processing.
- For example:

Tensor Type | Data Type | Data Shape | Layer Type | Layer Description |
---|---|---|---|---|

2D tensor | Vector data | `(samples, features)` |
`layer_dense` |
densely connected layer |

3D tensor | Sequence data | `(samples, timesteps, features)` |
`layer_lstm` |
recurrent layers |

4D tensor | Image data | `(samples, height, width, channels)` |
`layer_conv_2d` |
2D convolution layers |

Building deep-learning models in Keras:

- It is done by clipping together compatible layers to form useful data-transformation pipelines.

Layer compatibility:

Every layer will only accept input tensors of a certain shape and will return output tensors of a certain shape.

When using Keras, you donâ€™t have to worry about compatibility, because the layers you add to your models are dynamically built to match the shape of the incoming layer.

## Examples

### Fully connected model for MNIST

```
model <- keras_model_sequential() %>%
layer_dense(units = 512, activation = "relu", input_shape = c(28*28)) %>%
layer_dense(units = 10, activation = "softmax")
```

### Fully connected model for IMDB

```
model <- keras_model_sequential() %>%
layer_dense(units = 16, activation = "relu", input_shape = c(10000)) %>%
layer_dense(units = 16, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid")
```

### Fully connected model for Boston House dataset

```
model <- keras_model_sequential() %>%
layer_dense(units = 64, activation = "relu", input_shape = c(13)) %>%
layer_dense(units = 64, activation = "relu") %>%
layer_dense(units = 1)
```

## Explaining the code

### The pipe operator

`%>%`

- The pipe operator comes from the
`magrittr`

package. - Shorthand for passing the value on its left as the first argument to the function on its right.

```
model <- keras_model_sequential()
layer_dense(model, units = 512, activation = "relu", input_shape = c(28*28))
layer_dense(model, units = 10, activation = "softmax")
```

- Besides compactness, the
`%>%`

reminds that Keras models are modified in-place.- You donâ€™t operate on
`model`

and then return a new`model`

object. - Rather, you do something to the
`model`

object.

- You donâ€™t operate on

### Linear Stack of Layers

`keras_model_sequential`

- Defines a Keral model composed of a linear stack of layers.

- Not to be confused with it being a model for sequential data

### Dense layer

`layer_dense`

- Dense or fully connected layer

Implements the operation: `output = activation(dot(input, weight) + bias)`

`input`

: 2D input tensor. Gets flattened if`rank > 2`

.`weight`

: 2D weight tensor created by the layer.`bias`

: 1D bias tensor created by the layer (`use_bias=TRUE`

).`dot(input, weight)`

: Dot product between two tensors.`activation(.)`

: Element-wise activation function.

Most important inputs:

`input_shape`

: Dimensionality of the input, not including the samples axis.- Required only for the first layer in a model.

`units`

: Dimensionality of the output space.`activation`

: The name of the activation function. Default to`linear`

.`relu(x) = max(x,0)`

is the most commonly used non-linear activation function.

## Some observations

- With Neural Networks, we are able to build models able to capture complex patterns in the data from simple, differentiable operations.
- The importance of the non-linear activation function

Without them each layer would only be able to learn linear transformations of the input data and a deep stack of linear layers would still implement a linear operation. The activation function relu add non-linearity to the model.

## Reference material

This lecture note is based on (Chollet and Allaire 2018).

## References

Chollet, F., and J. Allaire. 2018. *Deep Learning with R*. Manning Publications. https://books.google.no/books?id=xnIRtAEACAAJ.