Trending December 2023 # Introduction And Implementation To Neural Style Transfer – Deep Learning # Suggested January 2024 # Top 13 Popular

You are reading the article Introduction And Implementation To Neural Style Transfer – Deep Learning updated in December 2023 on the website We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested January 2024 Introduction And Implementation To Neural Style Transfer – Deep Learning

This article was published as a part of the Data Science Blogathon.


Deep Learning is a subset of Machine Learning based on Artificial Neural Networks. The main idea behind Deep Learning is to mimic the working of a human brain. Some of the use cases in Deep Learning involves Face Recognition, Machine Translation, Speech Recognition, etc. Learning can be supervised,semi-supervised, or unsupervised.

What is Neural Style Transfer?

If you are an artist I am sure you must have thought like, What if I can paint like Picasso? Well to answer that question Deep Learning comes with an interesting solution-Neural Style Transfer.

In layman’s terms, Neural Style Transfer is the art of creating style to any content. Content is the layout or the sketch and Style being the painting or the colors. It is an application of Image transformation using Deep Learning.

How does it work?

Unsurprisingly there have been quite a few approaches towards NST but we would start with the traditional implementation for basic understanding and then we will explore more!

The base idea on which Neural Style Transfer is proposed is “it is possible to separate the style representation and content representations in a CNN, learned during a computer vision task (e.g. image recognition task).“

I am assuming you must have heard about the ImageNet Competition from where we were introduced to the state of the art models starting from AlexNet then VGG then RESNET and many more. There is something common in all these models is that they are trained on a large ImageNet Dataset (14 million Images with 1000 classes) which makes them understand the ins and out of any image. We leverage this quality of these models by segregating the content and the style part of an image and providing a loss function to optimize the required result.

As stated earlier, we define a pre-trained convolutional model and loss functions which blends two images visually, therefore we would be requiring the following inputs

A Content Image – image on which we will transfer style

A Style Image – the style we want to transfer

An Input Image(generated) – The final content plus the required style image


Like I said we will be using pre-trained convolutional neural networks. A way to cut short this process is the concept of transfer learning where libraries like keras have provided us with these giants and let us experiment with them on our own problem statements. Here we will be using keras for transfer learning…we can load the model using the following lines of code…

The first two lines involve importing libraries like keras. Then we will load the model using vgg19.VGG19() where include_top = False depicts that we don’t want the final softmax layer which is the output layer used to classify the 1000 classes in the competition.

The fourth line makes a dictionary that will store the key as layer name and value as layer outputs. Then we finally define our model with inputs as VGG input specification and outputs as the dictionary we made for each layer.

Next, we will define the layers from which we will extract our content and style characteristics.

We have already made the dictionary where we can map these layers and extract the outputs.

Loss Functions

To get the desired image we will have to define a loss function which will optimize the losses towards the required result. Here we will be using the concept of per pixel losses.

Per Pixel Loss is a metric that is used to understand the differences between images on a pixel level. It compares the output pixel values with the input values. (Another method is perpetual loss functions we will discuss briefly at the later stages of the blog). Sometimes per pixel loss has its own drawbacks in terms of representing every meaningful characteristic. That’s where perpetual losses come into the picture. The loss terms we will be focusing on will be-

Content Loss

Style Loss

Content Loss

It makes sure the content we want in the generated image is captured efficiently. It has been observed that CNN captures information about the content in the higher levels of the network, whereas the lower levels are more focused on the individual pixel values.

Here the base is the content features while the combination is the generated output image features. Here the reduce_sum computes the sum of elements across the dimensions of the specified parameters which is in this case the difference of corresponding pixels between input(content) and generated image.

Style Loss

Defining the loss function for style has more work than content as multiple layers are involved in computing. The style information is measured as the amount of correlation present between the feature maps per layer. Here we use the Gram Matrix for computing style loss. So what is a gram matrix?

Gram matrix is the measure by which we capture the distribution of features over a set of feature maps in a given layer. So while you are basically computing or minimizing the style loss you are making the level of distribution of features the same in both of the styles and generated images.

So the idea is to make gram matrices of style and generated images and then compute the difference between the two. The Gram matrix(Gij) is the multiplication of the ith and jth feature map of a layer and then summed across height and width as shown above.

Now we have computed both the loss functions. Therefore to calculate the final loss we will compute a weighted summation of both the computed content and style losses.

The above code is the final integration of losses by traversing through the layers and computing the final loss by taking a weighted summation in the second last line. Finally, we would have to define an optimizer(Adam or SGD) that would optimize the loss of the network.



Other Approaches

There are many other faster proposals of NST which I would like you to explore and come up with faster mechanisms. One concept to follow is that there is a perpetual loss concept using an Image Transformer neural network which increases the speed of NST and it allows you to train your Image transformer neural network per content and apply various styles without retraining.

It is more helpful in deploying environments as the traditional model trains for each pair of content and style while this concept allows one-time content training followed by multiple style transformations on the same content.


Training a style transfer model requires two networks: a pre-trained feature extractor and a transfer network. The pre-trained feature extractor is used to avoid having to use paired training data. Its usefulness arises from the curious tendency for individual layers of deep convolutional neural networks trained for image classification to specialize in understanding specific features of an image.

The pre-trained model enables us to compare the content and style of two images, but it doesn’t actually help us create the stylized image. That’s the job of a second neural network, which we’ll call the transfer network. The transfer network is an image translation network that takes one image as input and outputs another image. Transfer networks typically have an encode-decoder architecture.

At the beginning of training, one or more style images are run through the pre-trained feature extractor, and the outputs at various style layers are saved for later comparison. Content images are then fed into the system. Each content image passes through the pre-trained feature extractor, where outputs at various content layers are saved. The content image then passes through the transfer network, which outputs a stylized image. The stylized image is also run through the feature extractor, and outputs at both the content and style layers are saved.

The quality of the stylized image is defined by a custom loss function that has terms for both content and style. The extracted content features of the stylized image are compared to the original content image, while the extracted style features are compared to those from the reference style image(s). After each step, only the transfer network is updated. The weights of the pre-trained feature extractor remain fixed throughout. By weighting the different terms of the loss function, we can train models to produce output images with lighter or heavier stylization.



Congratulations you have learned what a Neural Style Transfer is and how it works. But that is certainly not the end, next comes exploring the topic with more recent research papers, blogs, and faster implementations. For that too you have a kick start. I hope you enjoyed the blog which targeted the basic traditional workflow of a Neural Style Transfer and I hope I was able to induce an intuition towards understanding NST.

Thank you for reading!


You're reading Introduction And Implementation To Neural Style Transfer – Deep Learning

Introduction To Deep Learning In Julia

This article was published as a part of the Data Science Blogathon


In the current scenario, the Data science field is dominated by Python/R but there is another competition added not so long ago, Julia! which we will be exploring in this guide. The famous quote (motto) of Julia is –

Looks like Python, runs like C

We know that python is used for a wide range of tasks. Julia, on the other hand, was primarily developed to perform scientific computation, machine learning, and statistical tasks.

Since Julia was explicitly made for high-level statistical work and scientific computations, it has several benefits over Python. In linear algebra, for example, “vanilla” (raw) Julia performs better than “vanilla” (raw) Python. This is primarily because, unlike Julia, Python doesn’t support all equations and the matrices used in machine learning.

While Python is a great language with its library Numpy, Julia completely outperforms it when it comes to non-package experience, with Julia being more catered towards machine learning tasks and computations.

Table of contents




Cuda Arrays

Automatic Differentiation

Training a Classifier


So, let’s get started!


This guide is to get you started with the mechanics of Flux, to start building models right away. While this is loosely based on a tutorial by Pytorch, it will cover all the areas necessary. It introduces basic Julia programming, as well Zygote, a source-to-source automatic differentiation (AD) framework in Julia. Using all these tools, we will build a simple neural network and in the end a CNN which we will train to classify between 10 classes.

What is Flux in Julia?

Flux is an open-source machine-learning software library written completely in Julia. A stable release which we will be using is v0.12.4. As we would have expected, it has a layer-on layer stacking-based interface for simple models with strong support on interoperability with other packages of Julia, instead of having a monolithic design. For example, if we need GPU support we can get it directly via the implementation of CuArrays. This is in complete contrast to other frameworks in Julia which are implemented in different languages but bound with Julia such as Tensorflow (Julia Package) and thus are more or less limited by the functionality present in their implementation.

Installation of Julia

Before we move further, if you don’t have Julia installed in your system, it can be from its official site chúng tôi .

To use Julia in Jupyter notebook like Python, we only need to add the IJulia package as follows and we can run Julia right from the jupyter notebook.

using Pkg Pkg.add("IJulia")

We can use Julia as we used Python in Jupyter notebook for exploratory data analysis.

Arrays in Julia

Before moving on to the framework, we need to understand the basics of a deep learning framework. Arrays, CudaArrays, etc. In this section, I’ll explain the basics of the array in Julia.

with just three elements.

x = [10,12,16]

Here’s a matrix – a square array with four elements.

x = [10 12; 13 14]

elements, each a random number ranging from 0 to 1.

x = rand(12, 2)

rand is not just the only function that can create a random matrix (array) we can use different functions like ones, zeros, or randn. Try them out in the jupyter notebook to see what they do.

By default, Julia stores all the numbers in a high-precision format called Float64. In Machine Learning we often don’t need all those many digits, so we can configure Julia to decrease it to Float32, or if we need higher precision than 64 bits we can use BigFloat. Below is an example of a random matrix of 6×3 = 18 elements of BigFloat.

x = rand(BigFloat, 6, 3) x = rand(Float32, 6, 3)

To count the number of elements in a matrix we can use the length function.


Or, if we need the size we can check it more explicitly.


We can do many sorts of algebraic operations on matrix, for example, we can add two matrices

x + x

Or subtract them.

x - x

Julia supports a feature called broadcasting, using the “.” syntax. The broadcast() is an inbuilt function in julia that is used to broadcast or apply the function f over the collections, arrays, or tuples. This makes it easy to apply a function to one or more arrays with a concise dot. syntax. For example – f.(a, b) means “apply f elementwise to a and b”. We can use this broadcasting in our matrix to add 1 element-wise in x.

x .+ 1

Finally, we have to use Matrix Multiplication more or less every time we use Machine Learning. Is super-easy to use with Julia.

W = randn(4, 10) x = rand(10) W * x CUDA Arrays in Julia

CUDA functionality is provided separately by the CUDA package from Julia. If you have a GPU and CUDA available, you can run ] add CUDA in IJulia (jupyter notebook) to get it. Once you get the CUDA installed (compatible versions below julia 1.6) we can transfer our arrays into CUDA arrays (or in GPU) using cu function. It supports all the basic functionalities of an array but now works on GPU.

GPU hardware. In this section, I will briefly demonstrate the use of the CuArray type. Since we are exposing CUDA’s functionality by implementing existing Julia interfaces on the CuArray type, we should refer to the upstream Julia documentation for more in-depth information on these operations.

import Pkg Pkg.add("CUDA")

Import the library and covert matrix into CudaArrays. Now, these CuArrays will run on GPU which by default is much faster than Arrays and we barely had to do anything in it.

using CUDA x = cu(rand(6, 3)) Flux.jl in Julia

Flux is a library or package in Julia specifically for machine learning. It comes with a vast range of functionalities that help us harness the full potential of Julia, without getting our hands messy (like auto-differentiation). We follow a few key principles in Flux.jl

and will be faster.

You could have written Flux from scratch – From LSTMs to GPU Kernels, it is a very straightforward Julia code. Whenever in doubt, one can always look to the documentation. If you need something different, you can easily write your own code in it.

Integrates nicely with others – Flux works well with Julia libraries from Data Frames to Images (Images package) and even differential equations solver (another package in Julia for computation), so you can easily build complex data processing pipelines that integrate Flux models.


You can add Flux from using Julia’s package manager, by typing ] add Flux in the Julia prompt or use

import Pkg Pkg.add("Flux") Automatic Differentiation

Automatic differentiation (AD), also called algorithmic differentiation or simply “auto diff”, is used to calculate differentiation of functions. It is a family of techniques similar to backpropagation for efficiently evaluating derivatives of numeric functions expressed as a form of computer programs.

One probably has learned to differentiate functions in calculus classes but let’s recap it in Julia code.

f(x) = 4x^2 + 12x + 3 f(4)

In simpler cases like these, we can easily find the gradient by hand, for example in this it is 8x + 12. But it’s much faster and efficient to make the Flux do it for us!

using Flux: gradient df(x) = gradient(f, x)[1] df(4)

We can cross-check with few more inputs, to see if the gradient calculated by Flux is correct and is indeed 8x+12. We can do it multiple times and since the function we took was the C_2 function second derivative is just an integer 8.

ddf(x) = gradient(df, x)[1] ddf(4)

As long as the mathematical functions we create in Julia are differentiable we can use auto differentiation of Flux to handle any code we throw at it, which includes recursion, loops, and even custom layers. For example, we can try to differentiate the Taylor series approximation of sin function.

mysin(x) = sum((-1)^k*x^(1+2k)/factorial(1+2k) for k in 0:6) x = 0.6 mysin(x), gradient(mysin, x) sin(x), cos(x)

As we expected the derivative is numerically very close to the function cos(x) (which is sinx derivative).

What if instead of just taking a single number as input, we take arrays as inputs? This gets more interesting as we proceed further. Let’s take an example where we have a function that takes a matrix and two vectors.

myloss( W , b , x ) = sum(W * x .+ b) #calculating loss W = randn(3, 5) b = zeros(3) x = rand(5) gradient(myloss, W, b, x)

Now we get gradients for each of the inputs W, b, and x, and these will come in very handy when we have to train our model. Since we know that machine learning models can contain hundreds or thousands of parameters, Flux here provides a slightly different method of writing gradient. Just like other deep learning frameworks, we mark our arrays with params to indicate that we want its gradients. W and b represent the weight and bias respectively.

using Flux: params W = randn(3, 5) b = zeros(3) x = rand(5) y(x) = sum(W * x .+ b)

Using those parameters we can now get the gradients of W and b directly. It’s especially useful when we are working with layers. Think of the layer as a container for parameters. For example, the Dense function from Flux does familiar linear transform.

using Flux m = Dense(10, 5) x = rand(Float32, 10)

To get parameters of any layer or model we can always simply use params from Flux.


So even if our network has many many parameters we can easily calculate their gradient for all parameters.

x = rand(Float32, 10) #ran array m = Chain(Dense(10, 5, relu), Dense(5, 2), softmax) #creating a layer l(x) = sum(Flux.crossentropy(m(x), [0.5, 0.5])) #loss function l(x)

We don’t explicitly have to use layers but sometimes they can be very convenient for many simple kinds of models and faster iterations.

The next step would be to update the weights of the network and perform optimization using different algorithms. The first optimization algorithm which comes to mind is Gradient Descent because of its simplicity. We take the weights and steps using a learning rate which is hyper-param and the gradients. weights = weights – learning_rate x gradient.

using Flux.Optimise: update!, Descent η = 0.1 #learning rate for p in params(m) end

While the method we used above to update the param in place using gradients is valid, it can get way more complicated as the algorithms we use gets more involved in it. Here, Flux comes to the rescue with its prebuilt set of optimizers which makes our work way too easy. All we need to do is give the algorithm a learning rate and that’s it.

opt = Descent(0.01)

So training a new network finally reduces down to iteration on the given dataset multiple times (epochs) and performing all the steps in order (given below in code). For the sake of simplicity and clarity, we do a quick implementation in Julia, let’s train a network that learns to predict 0.5 for every input of 10 floats. Flux has a function called train! to do all this for us.

data, labels = rand(10, 100), fill(0.5, 2, 100) #dataset loss(x, y) = sum(Flux.crossentropy(m(x), y)) #creating loss function Flux.train!(loss, params(m), [(data,labels)], opt) #training the model

You don’t have to use the train! In cases where arbitrary logic might be better suited, you could open up this training loop like so:

for d in training_set #assuming d looks like ( data, labels) # our logic here gs = gradient( params( m ) ) do # m is our model l = loss(d...) end update!( opt, params(m), gs) end

And this concludes the basics of Flux usage, in the next section, we will learn to implement it to train a classifier for the CIFAR10 dataset.

Training a Classifier for the Deep Learning Model

Getting a real classifier to work might help fix the workflow in Julia a bit more. CIFAR10 is a dataset of 50k tiny training images split into 10 classes of dogs, birds, deer, etc. The reader is requested to check the image below for more details.

We will do the following steps in order to get a classifier trained –

Load the dataset of CIFAR10 (both training and test dataset)

Create a Convolution Neural Network (CNN)

Define a loss function to calculate losses

Use training data to train our network

Evaluate our model on the test dataset

Useful Libraries to install before we proceed, installation is simple but might take few minutes to completely install.

] add Metalhead #to get the data ] add Images #Image processing package ] add ImageIO #to output images

Loading the Dataset

Metalhead.jl (Package) is an excellent package that has tons of classic predefined and pre-trained CV (computer vision) models. It also consists of a variety of data loaders that come in handy during the dataset load process.

using Statistics using Flux, Flux.Optimise #deep learning framework using Metalhead, Images #to load dataset using Metalhead: trainimgs using Images.ImageCore #to work on image processing using Flux: onehotbatch, onecold #to encode using Base.Iterators: partition using CUDA #for GPU functionality

This image will give us an idea of the different types of labels we are dealing with. #download the dataset CIFAR10 X = trainimgs(CIFAR10) #take the training dataset as X labels = onehotbatch([X[i].ground_truth.class for i in 1:50000],1:10) #encode the dataset

To get more information about what we are dealing with let’s take a look at a random image from the dataset.

image(x) = chúng tôi # handy for use later ground_truth(x) = x.ground_truth image.(X[rand(1:end, 10)]) #to show the images in IJulia itself

With 3 RGB layers of the matrix (32x32x3), together create the image vector we see above. Now since the dataset is too large, we can pass them in batches (take 1000) and keep a set for validation to check the evaluation of our model. This process of passing them in batches is called mini-batch learning and is very popular in machine learning. So, in layman terms, rather than sending our entire dataset which is big and might not fit in RAM, we break the dataset into small packets (mini-batches), usually chosen randomly, and then train our model on it. It is observed that they help with escaping the saddle points (it is the minimax point on the surface of the curve).

First, we define a ‘getarray’ function that would help in converting the matrices to Float type.

getarray(X) = float.( permutedims( channelview( X ), (2, 3, 1))) #get the matrix to float type imgs = [ getarray(X[i].img ) for i in 1:50000] #get all the matrices into float

In our batch of 1000, the first 49,000 images will make our training set and the rest will be saved for validation or test set. To achieve this we can use the function called ‘partition’ which handily breaks down the set we give it in consecutive parts (1000). and to concatenate we use use ‘cat’ function along any dimension.

valset = 49001:50000

Defining the Classifier

Now comes the part where we can define our Convolutional Neural Network (CNN).

Definition of a convolutional neural network is – one that defines a kernel and slides it across a matrix to create an intermediate representation to extract features from. As it goes into deeper layers it creates higher-order features which make it suitable for images (although it can be used in plenty of other situations), where the structure of the subject is what will help us determine which class it belongs to.

m = Chain( #crearting a CNN MaxPool((2,2)), #first layer of CNN MaxPool((2, 2)), #second layer of CNN Dense(200, 120), #first layer Dense(120, 84), #second layer Dense(84, 10), #third and final layer with 10 classification labels.

Whenever we have to work with data that has multiple independent classes, cross-entropy comes in handy. And for the momentum, as the name suggests, it gradually lowers the learning rate as we proceed further with the training. This is necessary in case we overshoot from the desired destination and the chances for local minima increase while helping us to maintain a bit of adaptivity in our optimization.

using Flux: crossentropy, Momentum #import the optimizers loss(x, y) = sum(crossentropy(m(x), y)) #using loss function opt = Momentum(0.01) #fixing the momentum

Before starting our training loop, we will need some sort of basic accuracy numbers about our model to keep the track of our progress. We can design our custom function to achieve just the same.

accuracy(x, y) = mean( onecold(m (x), 1:10) .== onecold(y, 1:10))

Training the Classifier

This is the part where we finally stitch everything together, here we do all the interesting operations which we defined previously to see what our model is capable of doing. Just for the tutorial, we will only be using 10 iterations over dataset (epochs) and optimize it, although for greater accuracy you can increase the epochs and play with hyperparameters a bit.

epochs = 10 #number of iterations for epoch = 1:epochs for d in train gs = gradient(params(m)) do l = loss(d...) #calculate losses end update!(opt, params(m), gs) #upadate the params weights end @show accuracy(valX, valY) #show the accuracy of model after each epoch end

Step by step training process gives us a brief idea of how the network was learning the function.  This accuracy is not bad at all for a model which was small and had no hyperparameter tuned with smaller epochs.

Training on a GPU

Testing the Network

As we have trained our neural network for 100 passes over the training dataset. But we would need to check if our model has learned anything at all. To check this, we simply predict the labels corresponding to each class from our neural net output, and checking it against the true values of class labels. If the prediction is correct, we add that sample to the correct prediction (true values) list. This will be done on the still unseen part of the data.

Firstly, we would have to get the same processing of images as we did on the training data set to compare them side by side.

valset = valimgs(CIFAR10) #value set valimg = [ getarray(valset[i].img) for i in 1:10000] #get them to array labels = onehotbatch([valset[i].ground_truth.class for i in 1:10000],1:10)#encode them test = gpu.( [(cat(valimg[i]..., dims = 4), labels[:,i]) for i in partition(1:10000, 1000)])

Next, we display some of the images from our validation dataset.

ids = rand(1:10000, 10) #random image ids image.(valset[ids]) #show images in vector form

We have 10 values as the output for all 10 classes. If the particular value is higher for a class, our network thinks that image is from that particular class. The below image shows the values (energies) in 10 floats and every column corresponds to the output of one image.

Let’s see how our model fared on the dataset.

rand_test = getarray.( image.(valset[ids])) #get the test images rand_truth = ground_truth.(valset[ids]) #check the values against true values m(rand_test)

This looks very similar to how we would have expected our results to be. Even after the small training period, let’s see how our model actually performs on any new data given, (that was prepared by us).

accuracy( test[1]...)#testing accuracy

49% is clearly much better than the chances of randomly having it correct which is 10% (since we have 10 classes) which is not bad at all for the small hand-coded models without hyper-parameter tuning like ours.

Let’s take a look at how the net performed on all the classes performed individually.

class_correct = zeros(10) #creating an array of zeros class_total = zeros(10) for i in 1:10 preds = m(test[i][1]) #prediction after feeding it in our model lab = test[i][2] for j = 1:1000 pred_class = findmax(preds[:, j])[2] #find the argmax for each class actual_class = findmax(lab[:, j])[2] #true vale of class if pred_class == actual_class #if both are equal then then increment values by 1 class_correct[pred_class] += 1 end class_total[actual_class] += 1 end end class_correct ./ class_total #getting total number of ratios (/100) times we get it correct

The spread seems pretty good, but some classes are performing significantly better than others. It is left for the reader to explore the reason.


In this article, we learned how powerful Julia is when it comes to computation. We learned about the Flux package and how to use it to train our hand-written model to classify between 10 different classes in just a few lines of code, that too on GPU!. We also learned about CuArrays and their significance in decreasing computation time. Hope this article has been helpful in starting your journey with Flux (Julia).

Thanks to the Mike Innes, Andrew Dinhobl, Ygor Canalli et al. for valuable documentation. Reach out to me via LinkedIn (Nihal Singh).


A Basic Introduction To Opencv In Deep Learning

This article was published as a part of the Data Science Blogathon.

OpenCV is a massive open-source library for various fields like computer vision, machine learning, image processing and plays a critical function in real-time operations, which are fundamental in today’s systems. It is deployed for the detection of items, faces, Diseases, lesions, Number plates, and even handwriting in various images and videos. With help of OpenCV in Deep Learning, we deploy vector space and execute mathematical operations on these features to identify visual patterns and their var

Table of Contents

  What is Computer Vision?


Installing and Importing the OpenCV Image Preprocessing Package 

 Reading an Input Image

 Image Data Type

 Image Resolution

 Image Pixel Values

 Viewing the Images


Image Operations Using OpenCV and Python

 OpenCV Applications

 Functionality of OpenCV


What is Computer Vision?

Computer vision is an approach to understanding how photos and movies are stored, as well as manipulating and extracting information from them. Artificial Intelligence depends on or is mostly based on computer vision. Self-driving cars, robotics, and picture editing apps all rely heavily on computer vision

Human vision has a resemblance to that of computer vision. Human vision learns from the various life experiences and deploys them to distinguish objects and interpret the distance between various objects and estimate the relative position.

                       Source: Analytics Vidhya

With cameras, data, and algorithms, computer vision trains machines to accomplish these jobs in much less time.

Computer vision allows computers and systems to extract useful data from digital images and video inputs.

Installing and Importing the OpenCV Image Preprocessing Package

OpenCV in deep learning is an extremely important important aspect of many Machine Learning algorithms. OpenCV is an open-source library (package) for computer vision, machine learning, and image processing applications that run on the CPU exclusively. It works with many different programming languages, including Python. It can be imported with single line command as being depicted below

pip install opencv-python

A package in Python is a collection of modules that contain pre-written programmes. These packages allow you to import modules separately or in their whole. Importing the package is as simple as calling the “cv2” module as seen below:

import cv2 as cv

Reading an Input Image

Colour photographs, grayscale photographs, binary photographs, and multispectral photographs are all examples of digital images. In a colour image, each pixel contains its colour information. Binary images have only two colours, usually black and white pixels, and grayscale images have only shades of grey as their only colour. Multispectral pictures gather image data spanning the electromagnetic spectrum within a specific wavelength.

To read the image, we use the “imread” method from the cv2 package, where the first parameter is the image’s path, including filename and extension, and the second parameter is a flag that determines how to read in the image.

The features of a picture that is being utilised as an input

import cv2 # To read image cv2.imread function, img = cv2.imread("pythonlogo.png", cv2.IMREAD_COLOR) # Creating GUI window to display an image on screen cv2.imshow("Cute Kitens", img)


Image  Data Type

To discover the image’s type, use the “dtype” technique. This strategy enables us to comprehend the representation of visual data and the pixel value.

in addition to the image kind, It’s a multidimensional container for things of comparable shape and size.

Pixel values for the image

A collection of small samples can be thought of as an image. These samples are referred to as pixels. To have a better understanding of an image, try zooming in as much as possible. Divided into several squares, the same can be seen. These are pixels, and when all of them are combined, they form an image. One of the simplest methods to represent an image is via a matrix.


print("The data type of the image is",image.dtype) Output: The data type of the image is uint8 uint8 is representing each pixel value being an Unsigned Integer of 8 bits. This data type ranges between 0 to 255 Image Resolution

Image resolution is defined as the number of pixels in an image. As the number of pixels rises, the image quality improves. As we saw before, the image’s shape determines the number of rows and columns. Pixel values in images: 320 x 240 pixels (mostly suitable for small screen devices), 1024 x 768 pixels (appropriate for viewing on standard computer monitors), 720 x 576 pixels (good for viewing on standard definition TV sets with 4:3 aspect ratio), 1280 x 720 pixels (for viewing on widescreen monitors), 1280 x 1024 pixels (for viewing on full-screen monitors) Pixel values in images.

Image Pixel Values

A collection of small samples can be thought of as an image. The unit of measurement for these samples is pixels. For improved comprehension, try zooming in on a picture as much as possible. The same can be divided into several different squares. These are pixels that, when combined, make up an image.

The quality of an image decreases as the number of pixels in the image increases. The image’s shape, which we saw earlier, determines the number of rows and columns.

Viewing the Images

Let’s have a look at how to make the image appear in a window. We’ll need to create a graphical user interface (GUI) window to display the image on the screen to do so. The title of the GUI window screen must be the first parameter, and it must be specified in string format. The image can be displayed in a pop-up window using the cv2.imshow() method. However, if you try to close it, you can get stuck with its window. We can use the “waitKey” method to mitigate this.

The “waitKey” parameter has been set to ‘0’ to keep the window open until we close it. (You can specify the time in milliseconds instead of 0, indicating how long it should be open for.)

# To read image from disk, we use # cv2.imread function, in below method, img = cv2.imread("python logo.png", cv2.IMREAD_COLOR) # Creating GUI window to display an image on screen # first Parameter is windows title (should be in string format) # Second Parameter is image array cv2.imshow("The Logo", img) # To hold the window on screen, we use cv2.waitKey method, If 0 pass an parameter, then it will # hold the screen until user close it. cv2.waitKey(0) # for removing/deleting created GUI window from screen # and memory cv2.destroyAllWindows()


Output: GUI Window, Source: Author

Reconstructing the image bit planes after extracting the image bit planes

An image can be divided into several levels of bit planes. Divide an image into 8-bit (0-7) planes, with the last few planes containing the majority of the image’s data.

Image Operations Using OpenCV and Python


Checking Properties of the Input Image

Input Image:

                                                                                  Source: Author

import cv2

import numpy as np import matplotlib.pyplot as plt img = plt.imread("my pic.jpg") plt.imshow(img) print(img.shape) print(img.size) print(img.dtype)


(1921, 1921, 3) 11070723 uint8



Basic Image Processing 

Input Image:


import cv2

import numpy as np

image = cv2.imread(“baby yoda.jpg”)

#cv2.imshow(‘Example – Show image in window’,image)

img2 = cv2.cvtColor(image,cv2.COLOR_BGR2RGB)


                                                                                      Source: Author

Dilation and Erosion of the Input Image 

Input Image:

import cv2 import numpy as np import matplotlib.pyplot as plt img = plt.imread("baby yoda.jpg") # Taking a matrix of size 5 as the kernel kernel = np.ones((5,5), np.uint8) # first parameter is basicaly  the original image, # kernel is the matrix with which image is convolved # and third parameter is the number of iterations, which will determine how much # you want to erode/dilate a given image. img_erosion = cv2.erode(img, kernel, iterations=1) img_dilation = cv2.dilate(img, kernel, iterations=1) plt.imshow(img) plt.imshow(img_erosion) plt.imshow(img_dilation)


Source: Author, Image after erosion effect

Source: Author, Image after dilation effect

OpenCV Applications

• The concept of OpenCV in Deep Learning is applied for recognition of faces.

• Counting the number of people (foot traffic in a mall, for example)

• Counting the number of automobiles on motorways and their speeds

• Interaction-based art installations

• Anomalies (defects) are detected during the production process (the odd defective products)

• Stitching an image from a street view

• Street view image stitching

• Video/image search and retrieval

• Robot and autonomous car navigation and control

• object recognition

• Medical image analysis

• Movies – 3D structure from motion

Functionality of OpenCV

• I/O, processing and display of images and videos

• Detection of objects and features

• Computer visi

• Computer-assisted photography


So in this article, we covered the basic Introduction about OpenCV Library and its application in real-time scenarios. We also covered other key terminologies and fields where OpenCV in deep learning is being deployed(Computer Vision) as well as implemented python code for performing some of the basic image operations(dilation, erosion, and changing image colours) with the help of the OpenCV library. Apart from that OpenCV in deep learning would also find application in a variety of industries.

Hello Everyone!!!

My name is Pranshu Sharma and I am a Data Science Enthusiast

For any feedback Email me at [email protected]

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.


Google Colab For Machine Learning And Deep Learning

“Memory Error” – that all too familiar dreaded message in Jupyter notebooks when we try to execute a machine learning or deep learning algorithm on a large dataset. Most of us do not have access to unlimited computational power on our machines. And let’s face it, it costs an arm and a leg to get a decent GPU from existing cloud providers. So how do we build large deep learning models without burning a hole in our pockets? Step up – Google Colab!

It’s an incredible online browser-based platform that allows us to train our models on machines for free! Sounds too good to be true, but thanks to Google, we can now work with large datasets, build complex models, and even share our work seamlessly with others. That’s the power of Google Colab.

What is Google Colab?

Google Colaboratory is a free online cloud-based Jupyter notebook environment that allows us to train our machine learning and deep learning models on CPUs, GPUs, and TPUs.

Here’s what I truly love about Colab. It does not matter which computer you have, what it’s configuration is, and how ancient it might be. You can still use Google Colab! All you need is a Google account and a web browser. And here’s the cherry on top – you get access to GPUs like Tesla K80 and even a TPU, for free!

TPUs are much more expensive than a GPU, and you can use it for free on Colab. It’s worth repeating again and again – it’s an offering like no other.

Are you are still using that same old Jupyter notebook on your system for training models? Trust me, you’re going to love Google Colab.

What is a Notebook in Google Colab? Google Colab Features

Colab provides users free access to GPUs and TPUs, which can significantly speed up the training and inference of machine learning and deep learning models.

Colab’s interface is web-based, so installing any software on your local machine is unnecessary. The interface is also intuitive and user-friendly, making it easy to get started with coding.

Colab allows multiple users to work on the same notebook simultaneously, making collaborating with team members easy. Colab also integrates with other Google services, such as Google Drive and GitHub, making it easy to share your work.

Colab notebooks support markdown, which allows you to include formatted text, equations, and images alongside your code. This makes it easier to document your work and communicate your ideas.

Colab comes pre-installed with many popular libraries and tools for machine learning and deep learning, such as TensorFlow and PyTorch. This saves time and eliminates the need to manually install and configure these tools.

GPUs and TPUs on Google Colab

Ask anyone who uses Colab why they love it. The answer is unanimous – the availability of free GPUs and TPUs. Training models, especially deep learning ones, takes numerous hours on a CPU. We’ve all faced this issue on our local machines. GPUs and TPUs, on the other hand, can train these models in a matter of minutes or seconds.

If you still need a reason to work with GPUs, check out this excellent explanation by Faizan Shaikh.

It gives you a decent GPU for free, which you can continuously run for 12 hours. For most data science folks, this is sufficient to meet their computation needs. Especially if you are a beginner, then I would highly recommend you start using Google Colab.

Google Colab gives us three types of runtime for our notebooks:


GPUs, and


As I mentioned, Colab gives us 12 hours of continuous execution time. After that, the whole virtual machine is cleared and we have to start again. We can run multiple CPU, GPU, and TPU instances simultaneously, but our resources are shared between these instances.

Let’s take a look at the specifications of different runtimes offered by Google Colab:

It will cost you A LOT to buy a GPU or TPU from the market. Why not save that money and use Google Colab from the comfort of your own machine?

How to Use Google Colab?

You can go to Google Colab using this link. This is the screen you’ll get when you open Colab:

You can also import your notebook from Google Drive or GitHub, but they require an authentication process.

Google Colab Runtimes – Choosing the GPU or TPU Option

The ability to choose different types of runtimes is what makes Colab so popular and powerful. Here are the steps to change the runtime of your notebook:

Step 2: Here you can change the runtime according to your need:

A wise man once said, “With great power comes great responsibility.” I implore you to shut down your notebook after you have completed your work so that others can use these resources because various users share them. You can terminate your notebook like this:

Using Terminal Commands on Google Colab

You can use the Colab cell for running terminal commands. Most of the popular libraries come installed by default on Google Colab. Yes, Python libraries like Pandas, NumPy, scikit-learn are all pre-installed.

If you want to run a different Python library, you can always install it inside your Colab notebook like this:

!pip install 


Pretty easy, right? Everything is similar to how it works in a regular terminal. We just you have to put an exclamation(!) before writing each command like:




Cloning Repositories in Google Colab

You can also clone a Git repo inside Google Colaboratory. Just go to your GitHub repository and copy the clone link of the repository:

Then, simply run:

And there you go!

Uploading Files and Datasets

Here’s a must-know aspect for any data scientist. The ability to import your dataset into Colab is the first step in your data analysis journey.

The most basic approach is to upload your dataset to Colab directly:

You can also upload your dataset to any other platform and access it using its link. I tend to go with the second approach more often than not (when feasible).

Saving Your Notebook

All the notebooks on Colab are stored on your Google Drive. The best thing about Colab is that your notebook is automatically saved after a certain time period and you don’t lose your progress.

If you want, you can export and save your notebook in both *.py and *.ipynb formats:

Not just that, you can also save a copy of your notebook directly on GitHub, or you can create a GitHub Gist:

I love the variety of options we get.

Exporting Data/Files from Google Colab

You can export your files directly to Google Drive, or you can export it to the VM instance and download it by yourself:

Exporting directly to the Drive is a better option when you have bigger files or more than one file. You’ll pick up these nuances as you work on bigger projects in Colab.

Sharing Your Notebook

Google Colab also gives us an easy way of sharing our work with others. This is one of the best things about Colab:

What’s Next?

Google Colab now also provides a paid platform called Google Colab Pro, priced at $9.99 a month. In this plan, you can get the Tesla T4 or Tesla P100 GPU, and an option of selecting an instance with a high RAM of around 27 GB. Also, your maximum computation time is doubled from 12 hours to 24 hours. How cool is that?

You can consider this plan if you need high computation power because it is still quite cheap when compared to other cloud GPU providers like AWS, Azure, and even GCP.


If you’re new to the world of Deep Learning, I have some excellent resources to help you get started in a comprehensive and structured manner:


Stock Prices Prediction Using Machine Learning And Deep Learning

17 minutes


Rating: 5 out of 5.


Predicting how the stock market will perform is one of the most difficult things to do. There are so many factors involved in the prediction – physical factors vs. psychological, rational and irrational behavior, etc. All these aspects combine to make share prices volatile and very difficult to predict with a high degree of accuracy.

Can we use machine learning as a game-changer in this domain? Using features like the latest announcements about an organization, their quarterly revenue results, etc., machine learning techniques have the potential to unearth patterns and insights we didn’t see before, and these can be used to make unerringly accurate predictions.

The core idea behind this article is to showcase how these algorithms are implemented. I will briefly describe the technique and provide relevant links to brush up on the concepts as and when necessary. In case you’re a newcomer to the world of time series, I suggest going through the following articles first:

Are you a beginner looking for a place to start your data science journey? Presenting a comprehensive course, full of knowledge and data science learning, curated just for you! This course covers everything from basics of Machine Learning to Advanced concepts of ML, Deep Learning and Time series.

Understanding the Problem Statement

We’ll dive into the implementation part of this article soon, but first it’s important to establish what we’re aiming to solve. Broadly, stock market analysis is divided into two parts – Fundamental Analysis and Technical Analysis.

Fundamental Analysis involves analyzing the company’s future profitability on the basis of its current business environment and financial performance.

Technical Analysis, on the other hand, includes reading the charts and using statistical figures to identify the trends in the stock market.

As you might have guessed, our focus will be on the technical analysis part. We’ll be using a dataset from Quandl (you can find historical data for various stocks here) and for this particular project, I have used the data for ‘Tata Global Beverages’. Time to dive in!

Note: Here is the dataset I used for the code: Download

We will first load the dataset and define the target variable for the problem:

Python Code:

There are multiple variables in the dataset – date, open, high, low, last, close, total_trade_quantity, and turnover.

The columns Open and Close represent the starting and final price at which the stock is traded on a particular day.

High, Low and Last represent the maximum, minimum, and last price of the share for the day.

Total Trade Quantity is the number of shares bought or sold in the day and Turnover (Lacs) is the turnover of the particular company on a given date.

Another important thing to note is that the market is closed on weekends and public holidays. Notice the above table again, some date values are missing – 2/10/2023, 6/10/2023, 7/10/2023. Of these dates, 2nd is a national holiday while 6th and 7th fall on a weekend.

The profit or loss calculation is usually determined by the closing price of a stock for the day, hence we will consider the closing price as the target variable. Let’s plot the target variable to understand how it’s shaping up in our data:

#setting index as date df['Date'] = pd.to_datetime(df.Date,format='%Y-%m-%d') df.index = df['Date'] #plot plt.figure(figsize=(16,8)) plt.plot(df['Close'], label='Close Price history')

In the upcoming sections, we will explore these variables and use different techniques to predict the daily closing price of the stock.

Moving Average Introduction

‘Average’ is easily one of the most common things we use in our day-to-day lives. For instance, calculating the average marks to determine overall performance, or finding the average temperature of the past few days to get an idea about today’s temperature – these all are routine tasks we do on a regular basis. So this is a good starting point to use on our dataset for making predictions.

The predicted closing price for each day will be the average of a set of previously observed values. Instead of using the simple average, we will be using the moving average technique which uses the latest set of values for each prediction. In other words, for each subsequent step, the predicted values are taken into consideration while removing the oldest observed value from the set. Here is a simple figure that will help you understand this with more clarity.

We will implement this technique on our dataset. The first step is to create a dataframe that contains only the Date and Close price columns, then split it into train and validation sets to verify our predictions.


Just checking the RMSE does not help us in understanding how the model performed. Let’s visualize this to get a more intuitive understanding. So here is a plot of the predicted values along with the actual values.

#plot valid['Predictions'] = 0 valid['Predictions'] = preds plt.plot(train['Close']) plt.plot(valid[['Close', 'Predictions']]) Inference

The RMSE value is close to 105 but the results are not very promising (as you can gather from the plot). The predicted values are of the same range as the observed values in the train set (there is an increasing trend initially and then a slow decrease).

In the next section, we will look at two commonly used machine learning techniques – Linear Regression and kNN, and see how they perform on our stock market data.

Linear Regression Introduction

The most basic machine learning algorithm that can be implemented on this data is linear regression. The linear regression model returns an equation that determines the relationship between the independent variables and the dependent variable.

The equation for linear regression can be written as:

Here, x1, x2,….xn represent the independent variables while the coefficients θ1, θ2, …. θn  represent the weights. You can refer to the following article to study linear regression in more detail:

For our problem statement, we do not have a set of independent variables. We have only the dates instead. Let us use the date column to extract features like – day, month, year,  mon/fri etc. and then fit a linear regression model.


We will first sort the dataset in ascending order and then create a separate dataset so that any new feature created does not affect the original data.

#setting index as date values df['Date'] = pd.to_datetime(df.Date,format='%Y-%m-%d') df.index = df['Date'] #sorting data = df.sort_index(ascending=True, axis=0) #creating a separate dataset new_data = pd.DataFrame(index=range(0,len(df)),columns=['Date', 'Close']) for i in range(0,len(data)): new_data['Date'][i] = data['Date'][i] new_data['Close'][i] = data['Close'][i] #create features from fastai.structured import add_datepart add_datepart(new_data, 'Date') new_data.drop('Elapsed', axis=1, inplace=True)  #elapsed will be the time stamp

This creates features such as:

‘Year’, ‘Month’, ‘Week’, ‘Day’, ‘Dayofweek’, ‘Dayofyear’, ‘Is_month_end’, ‘Is_month_start’, ‘Is_quarter_end’, ‘Is_quarter_start’,  ‘Is_year_end’, and  ‘Is_year_start’.

Note: I have used add_datepart from fastai library. If you do not have it installed, you can simply use the command pip install fastai. Otherwise, you can create these feature using simple for loops in python. I have shown an example below.

Apart from this, we can add our own set of features that we believe would be relevant for the predictions. For instance, my hypothesis is that the first and last days of the week could potentially affect the closing price of the stock far more than the other days. So I have created a feature that identifies whether a given day is Monday/Friday or Tuesday/Wednesday/Thursday. This can be done using the following lines of code:

new_data['mon_fri'] = 0 for i in range(0,len(new_data)): if (new_data['Dayofweek'][i] == 0 or new_data['Dayofweek'][i] == 4):     new_data['mon_fri'][i] = 1 else:     new_data['mon_fri'][i] = 0

We will now split the data into train and validation sets to check the performance of the model.

#split into train and validation train = new_data[:987] valid = new_data[987:] x_train = train.drop('Close', axis=1) y_train = train['Close'] x_valid = valid.drop('Close', axis=1) y_valid = valid['Close'] #implement linear regression from sklearn.linear_model import LinearRegression model = LinearRegression(),y_train) Results #make predictions and find the rmse preds = model.predict(x_valid) rms=np.sqrt(np.mean(np.power((np.array(y_valid)-np.array(preds)),2))) rms 121.16291596523156

The RMSE value is higher than the previous technique, which clearly shows that linear regression has performed poorly. Let’s look at the plot and understand why linear regression has not done well:

#plot valid['Predictions'] = 0 valid['Predictions'] = preds valid.index = new_data[987:].index train.index = new_data[:987].index plt.plot(train['Close']) plt.plot(valid[['Close', 'Predictions']]) Inference

As seen from the plot above, for January 2023 and January 2023, there was a drop in the stock price. The model has predicted the same for January 2023. A linear regression technique can perform well for problems such as Big Mart sales where the independent features are useful for determining the target value.

k-Nearest Neighbours Introduction

Another interesting ML algorithm that one can use here is kNN (k nearest neighbours). Based on the independent variables, kNN finds the similarity between new data points and old data points. Let me explain this with a simple example.

Consider the height and age for 11 people. On the basis of given features (‘Age’ and ‘Height’), the table can be represented in a graphical format as shown below:

To determine the weight for ID #11, kNN considers the weight of the nearest neighbors of this ID. The weight of ID #11 is predicted to be the average of it’s neighbors. If we consider three neighbours (k=3) for now, the weight for ID#11 would be = (77+72+60)/3 = 69.66 kg.

For a detailed understanding of kNN, you can refer to the following articles:

Implementation #importing libraries from sklearn import neighbors from sklearn.model_selection import GridSearchCV from sklearn.preprocessing import MinMaxScaler scaler = MinMaxScaler(feature_range=(0, 1))

Using the same train and validation set from the last section:

#scaling data x_train_scaled = scaler.fit_transform(x_train) x_train = pd.DataFrame(x_train_scaled) x_valid_scaled = scaler.fit_transform(x_valid) x_valid = pd.DataFrame(x_valid_scaled) #using gridsearch to find the best parameter params = {'n_neighbors':[2,3,4,5,6,7,8,9]} knn = neighbors.KNeighborsRegressor() model = GridSearchCV(knn, params, cv=5) #fit the model and make predictions,y_train) preds = model.predict(x_valid) Results #rmse rms=np.sqrt(np.mean(np.power((np.array(y_valid)-np.array(preds)),2))) rms 115.17086550026721

There is not a huge difference in the RMSE value, but a plot for the predicted and actual values should provide a more clear understanding.

#plot valid['Predictions'] = 0 valid['Predictions'] = preds plt.plot(valid[['Close', 'Predictions']]) plt.plot(train['Close']) Inference

The RMSE value is almost similar to the linear regression model and the plot shows the same pattern. Like linear regression, kNN also identified a drop in January 2023 since that has been the pattern for the past years. We can safely say that regression algorithms have not performed well on this dataset.

Let’s go ahead and look at some time series forecasting techniques to find out how they perform when faced with this stock prices prediction challenge.

Auto ARIMA Introduction

ARIMA is a very popular statistical method for time series forecasting. ARIMA models take into account the past values to predict the future values. There are three important parameters in ARIMA:

p (past values used for forecasting the next value)

q (past forecast errors used to predict the future values)

d (order of differencing)

Parameter tuning for ARIMA consumes a lot of time. So we will use auto ARIMA which automatically selects the best combination of (p,q,d) that provides the least error. To read more about how auto ARIMA works, refer to this article:

Implementation from pyramid.arima import auto_arima data = df.sort_index(ascending=True, axis=0) train = data[:987] valid = data[987:] training = train['Close'] validation = valid['Close'] model = auto_arima(training, start_p=1, start_q=1,max_p=3, max_q=3, m=12,start_P=0, seasonal=True,d=1, D=1, trace=True,error_action='ignore',suppress_warnings=True) forecast = model.predict(n_periods=248) forecast = pd.DataFrame(forecast,index = valid.index,columns=['Prediction']) Results rms=np.sqrt(np.mean(np.power((np.array(valid['Close'])-np.array(forecast['Prediction'])),2))) rms 44.954584993246954 #plot plt.plot(train['Close']) plt.plot(valid['Close']) plt.plot(forecast['Prediction']) Inference

As we saw earlier, an auto ARIMA model uses past data to understand the pattern in the time series. Using these values, the model captured an increasing trend in the series. Although the predictions using this technique are far better than that of the previously implemented machine learning models, these predictions are still not close to the real values.

As its evident from the plot, the model has captured a trend in the series, but does not focus on the seasonal part. In the next section, we will implement a time series model that takes both trend and seasonality of a series into account.

Prophet Introduction

There are a number of time series techniques that can be implemented on the stock prediction dataset, but most of these techniques require a lot of data preprocessing before fitting the model. Prophet, designed and pioneered by Facebook, is a time series forecasting library that requires no data preprocessing and is extremely simple to implement. The input for Prophet is a dataframe with two columns: date and target (ds and y).

Prophet tries to capture the seasonality in the past data and works well when the dataset is large. Here is an interesting article that explains Prophet in a simple and intuitive manner:

Implementation #importing prophet from fbprophet import Prophet #creating dataframe new_data = pd.DataFrame(index=range(0,len(df)),columns=['Date', 'Close']) for i in range(0,len(data)): new_data['Date'][i] = data['Date'][i] new_data['Close'][i] = data['Close'][i] new_data['Date'] = pd.to_datetime(new_data.Date,format='%Y-%m-%d') new_data.index = new_data['Date'] #preparing data new_data.rename(columns={'Close': 'y', 'Date': 'ds'}, inplace=True) #train and validation train = new_data[:987] valid = new_data[987:] #fit the model model = Prophet() #predictions close_prices = model.make_future_dataframe(periods=len(valid)) forecast = model.predict(close_prices) Results #rmse forecast_valid = forecast['yhat'][987:] rms=np.sqrt(np.mean(np.power((np.array(valid['y'])-np.array(forecast_valid)),2))) rms 57.494461930575149 #plot valid['Predictions'] = 0 valid['Predictions'] = forecast_valid.values plt.plot(train['y']) plt.plot(valid[['y', 'Predictions']]) Inference

Prophet (like most time series forecasting techniques) tries to capture the trend and seasonality from past data. This model usually performs well on time series datasets, but fails to live up to it’s reputation in this case.

As it turns out, stock prices do not have a particular trend or seasonality. It highly depends on what is currently going on in the market and thus the prices rise and fall. Hence forecasting techniques like ARIMA, SARIMA and Prophet would not show good results for this particular problem.

Long Short Term Memory (LSTM) Introduction

LSTMs are widely used for sequence prediction problems and have proven to be extremely effective. The reason they work so well is because LSTM is able to store past information that is important, and forget the information that is not. LSTM has three gates:

The input gate: The input gate adds information to the cell state

The forget gate: It removes the information that is no longer required by the model

The output gate: Output Gate at LSTM selects the information to be shown as output

For a more detailed understanding of LSTM and its architecture, you can go through the below article:

For now, let us implement LSTM as a black box and check it’s performance on our particular data.

Implementation #importing required libraries from sklearn.preprocessing import MinMaxScaler from keras.models import Sequential from keras.layers import Dense, Dropout, LSTM #creating dataframe data = df.sort_index(ascending=True, axis=0) new_data = pd.DataFrame(index=range(0,len(df)),columns=['Date', 'Close']) for i in range(0,len(data)): new_data['Date'][i] = data['Date'][i] new_data['Close'][i] = data['Close'][i] #setting index new_data.index = new_data.Date new_data.drop('Date', axis=1, inplace=True) #creating train and test sets dataset = new_data.values train = dataset[0:987,:] valid = dataset[987:,:] #converting dataset into x_train and y_train scaler = MinMaxScaler(feature_range=(0, 1)) scaled_data = scaler.fit_transform(dataset) x_train, y_train = [], [] for i in range(60,len(train)): x_train.append(scaled_data[i-60:i,0]) y_train.append(scaled_data[i,0]) x_train, y_train = np.array(x_train), np.array(y_train) x_train = np.reshape(x_train, (x_train.shape[0],x_train.shape[1],1)) # create and fit the LSTM network model = Sequential() model.add(LSTM(units=50, return_sequences=True, input_shape=(x_train.shape[1],1))) model.add(LSTM(units=50)) model.add(Dense(1)), y_train, epochs=1, batch_size=1, verbose=2) #predicting 246 values, using past 60 from the train data inputs = new_data[len(new_data) - len(valid) - 60:].values inputs = inputs.reshape(-1,1) inputs = scaler.transform(inputs) X_test = [] for i in range(60,inputs.shape[0]): X_test.append(inputs[i-60:i,0]) X_test = np.array(X_test) X_test = np.reshape(X_test, (X_test.shape[0],X_test.shape[1],1)) closing_price = model.predict(X_test) closing_price = scaler.inverse_transform(closing_price) Results rms=np.sqrt(np.mean(np.power((valid-closing_price),2))) rms 11.772259608962642 #for plotting train = new_data[:987] valid = new_data[987:] valid['Predictions'] = closing_price plt.plot(train['Close']) plt.plot(valid[['Close','Predictions']]) Inference

Wow! The LSTM model can be tuned for various parameters such as changing the number of LSTM layers, adding dropout value or increasing the number of epochs. But are the predictions from LSTM enough to identify whether the stock price will increase or decrease? Certainly not!

As I mentioned at the start of the article, stock price is affected by the news about the company and other factors like demonetization or merger/demerger of the companies. There are certain intangible factors as well which can often be impossible to predict beforehand.


Time series forecasting is a very intriguing field to work with, as I have realized during my time writing these articles. There is a perception in the community that it’s a complex field, and while there is a grain of truth in there, it’s not so difficult once you get the hang of the basic techniques.

Frequently Asked Questions

Q1. Is it possible to predict the stock market with Deep Learning?

A. Yes, it is possible to predict the stock market with Deep Learning algorithms such as moving average, linear regression, Auto ARIMA, LSTM, and more.

Q2. What can you use to predict stock prices in Deep Learning?

A. Moving average, linear regression, KNN (k-nearest neighbor), Auto ARIMA, and LSTM (Long Short Term Memory) are some of the most common Deep Learning algorithms used to predict stock prices.

Q3. What are the two methods to predict stock price?

A. Fundamental Analysis and Technical Analysis are the two ways of analyzing and predicting stock prices.


5 Top Deep Learning Trends

Deep learning (DL) could be defined as a form of machine learning based on artificial neural networks which harness multiple processing layers in order to extract progressively better and more high-level insights from data. In essence it is simply a more sophisticated application of artificial intelligence (AI) platforms and machine learning (ML). 

Here are some of the top trends in deep learning: 

Model Scale Up 

A lot of the excitement in deep learning right now is centered around scaling up large, relatively general models (now being called foundation models). They are exhibiting surprising capabilities such as generating novel text, images from text, and video from text. Anything that scales up AI models adds yet more capabilities to deep learning. This is showing up in algorithms that go beyond simplistic responses to multi-faceted answers and actions that dig deeper into data, preferences, and potential actions. 

Scale Up Limitations

However, not everyone is convinced that the scaling up of neural networks is going to continue to bear fruit. Roadblocks may lie ahead. 

“There is some debate about how far we can get in terms of aspects of intelligence with scaling alone,” said Peter Stone, PhD, Executive Director, Sony AI America.

“Current models are limited in several ways, and some of the community is rushing to point those out. It will be interesting to see what capabilities can be achieved with neural networks alone, and what novel methods will be uncovered for combining neural networks with other AI paradigms.”  

AI and Model Training 

AI isn’t something you plug in and, presto, instant insights. It takes time for the deep learning platform to analyze data sets, spot patterns, and begin to derive conclusions that have broad applicability in the real world. The good news is that AI platforms are rapidly evolving to keep up with model training demands. 

“Organizations can enhance their AI platforms by combining open-source projects and commercial technologies,” said Bin Fan, VP Open Source and Founding Engineer at Alluxio.

“It is essential to consider skills, speed of deployment, the variety of algorithms supported, and the flexibility of the system while making decisions.”

“Containerization being the key, Kubernetes will aid cloud-native MLOps in integrating with more mature technologies,” said Fan.

Prescriptive Modeling over Predictive Modeling

Modeling has gone through many phases over the last many years. Initial attempts tried to predict trends from historical data. This had some value, but didn’t take into account factors such as context, sudden traffic spikes, and shifts in market forces. In particular, real-time data played no real part in early efforts at predictive modeling. 

As unstructured data became more important, organizations wanted to mine it to glean insight. Coupled with the rise in processing power, suddenly real time analysis rose to prominence. And the immense amounts of data generated by social media has only added to the need to address real time information. 

How does this relate to AI, deep learning, and automation?

“Many of the current and previous industry implementations of AI have relied on the AI to inform a human of some anticipated event, who then has the expert knowledge to know what action to take,” said Frans Cronje, CEO and Co-founder of DataProphet.

“Increasingly, providers are moving to AI that can anticipate a future event and take the correspondent action.” 

This opens the door to far more effective deep learning networks. With real time data being constantly used by multi-layered neural networks, AI can be utilized to take more and more of the workload away from humans. Instead of referring the decision to a human expert, deep learning can be used to prescribe predicted decisions based on historical, real-time, and analytical data. 

Update the detailed information about Introduction And Implementation To Neural Style Transfer – Deep Learning on the website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!