Tensor: Data representation for neural networks

Currently, most (if not all) current ML systems use tensors as their basic data structure.

Tensors are a generalisation of vectors and matrices to an arbitrary number of dimensions.

A tensor is defined by 3 key attributes:

Abstract examples

Scalars (0D tensors)

  • R doesn’t have a data type to represent scalars. All numeric objects are vectors, matrices, or arrays.

  • But an R vector that is always length 1 is conceptually similar to a scalar.

# scalar
tensor <- as.array(1)
# rank
length(dim(tensor))
## [1] 1
# shape
dim(tensor)
## [1] 1
# data type
typeof(tensor)
## [1] "double"

Vectors (1D tensors)

  • Don’t confuse a 5D vector with a 5D tensor!
# vector
tensor <- as.array(c(12, 3, 6, 14, 10))
# rank
length(dim(tensor))
## [1] 1
# shape
dim(tensor)
## [1] 5
# data type
typeof(tensor)
## [1] "double"

Matrices (2D tensors)

  • A matrix has two axes (often referred to as rows and columns)
tensor <- matrix(rep(0, 3*5), nrow = 3, ncol = 5)
# rank
length(dim(tensor))
## [1] 2
# shape
dim(tensor)
## [1] 3 5
# data type
typeof(tensor)
## [1] "double"

3D tensors and higher-dimensional tensors

tensor <- array(rep(0, 2*3*2), dim = c(2,3,2))
# rank
length(dim(tensor))
## [1] 3
# shape
dim(tensor)
## [1] 2 3 2
# data type
typeof(tensor)
## [1] "double"

Concrete data examples

Vector data

  • 2D tensor of shape (samples, features)

  • Data structure most commonly used by statistical packages

  • Usually, this is the expected data structure when fitting fully connected neural networks

  • Example: The Boston housing price dataset

str(boston_train_data)
##  num [1:404, 1:13] 1.2325 0.0218 4.8982 0.0396 3.6931 ...
  • More complex data types can be mapped to vector data format.
    • Image data: Usually represented in 3D tensors can be flatened into a 2D tensor

Timeseries or sequence data

  • 3D tensor of shape (samples, timesteps, features)
  • Stock price dataset
    • Each minute we record the current price, last minute lowest price and highest price
    • A trading day has 390 minutes and a trading year has 250 days
    • One year of data can be store in a 3D tensor (250, 390, 3)
  • A statistician would usually store in a 2D tensor with a date column.

Image data

  • 4D tensor of shape (samples, height, width, channels)
  • Grayscale images (like our MNIST digits) have only a single color channel, but are still stored in 4D tensors by convention.

Note: Unlike TensorFlow, Theano uses the convention (samples, channels, height, width)

Video data

  • 5D tensor of shape (samples, frames, height, width, channels)

  • A video can be understood as a sequence of frames, each frame being a color image.

Tensor operations

output = relu(dot(W, input) + b)

Element-wise operations

  • Operations that are applied independently to each entry in the tensors being considered

  • Massively parallel implementations available (vectorised implementations)

Operations involving tensors of different dimensions

  • The R sweep() function enables you to perform operations between higher-dimension tensors and lower-dimension tensors
# Add vector y for each column of matrix x
sweep(x, 2, y, `+`)

Tensor dot

  • Most common and useful tensor operation

  • An element-wise product is done with the * operator in R, whereas dot products use the %*% operator

  • Mathematically, what does the dot operation do?

# dot product between two vector of the same length
naive_vector_dot <- function(x, y) {          
  z <- 0
  for (i in 1:length(x))
    z <- z + x[[i]] * y[[i]]
  z
}
# dot product between a matrix x and a vector y
naive_matrix_vector_dot <- function(x, y) {
  z <- rep(0, nrow(x))
  for (i in 1:nrow(x))
    row_x <- x[i,]    
    z[[i]] <- naive_vector_dot(row_x, y)
  z
}
# dot product between a matrix x and a matrix y
naive_matrix_dot <- function(x, y) {                   1
  z <- matrix(0, nrow = nrow(x), ncol = ncol(y))
  for (i in 1:nrow(x))
    for (j in 1:ncol(y)) {
      row_x <- x[i,]
      column_y <- y[,j]
      z[i, j] <- naive_vector_dot(row_x, column_y)
    }
  z
}
  • The dot product generalizes for higher dimensions, given the apropriate shape compatibility observed before:
    • (a, b, c, d) . (d) -> (a, b, c)
    • (a, b, c, d) . (d, e) -> (a, b, c, e)
    • And so on.

Tensor reshaping

  • Always use the array_reshape() function when reshaping R arrays that will be passed to Keras.
    • Uses row-major semantics (as opposed to R’s default column-major semantics)
    • Compatible with the way the numerical libraries called by Keras (NumPy, TensorFlow, and so on) interpret array dimensions.
x <- matrix(c(0, 1,
              2, 3,
              4, 5),
            nrow = 3, ncol = 2, byrow = TRUE)
x
##      [,1] [,2]
## [1,]    0    1
## [2,]    2    3
## [3,]    4    5
# R uses column-major semantics by default
dim(x) <- c(6,1)
x
##      [,1]
## [1,]    0
## [2,]    2
## [3,]    4
## [4,]    1
## [5,]    3
## [6,]    5
x <- matrix(c(0, 1,
              2, 3,
              4, 5),
            nrow = 3, ncol = 2, byrow = TRUE)
x
##      [,1] [,2]
## [1,]    0    1
## [2,]    2    3
## [3,]    4    5
# array_reshape uses row-major semantics
x <- array_reshape(x, dim = c(6, 1))
x
##      [,1]
## [1,]    0
## [2,]    1
## [3,]    2
## [4,]    3
## [5,]    4
## [6,]    5

Reference material

This lecture note is based on (Chollet and Allaire 2018).

References

Chollet, F., and J. Allaire. 2018. Deep Learning with R. Manning Publications. https://books.google.no/books?id=xnIRtAEACAAJ.