You are reading the article Introduction To Deep Learning In Julia updated in December 2023 on the website Kientrucdochoi.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested January 2024 Introduction To Deep Learning In Julia
This article was published as a part of the Data Science BlogathonOverview
In the current scenario, the Data science field is dominated by Python/R but there is another competition added not so long ago, Julia! which we will be exploring in this guide. The famous quote (motto) of Julia is –
Looks like Python, runs like C
We know that python is used for a wide range of tasks. Julia, on the other hand, was primarily developed to perform scientific computation, machine learning, and statistical tasks.
Since Julia was explicitly made for high-level statistical work and scientific computations, it has several benefits over Python. In linear algebra, for example, “vanilla” (raw) Julia performs better than “vanilla” (raw) Python. This is primarily because, unlike Julia, Python doesn’t support all equations and the matrices used in machine learning.
While Python is a great language with its library Numpy, Julia completely outperforms it when it comes to non-package experience, with Julia being more catered towards machine learning tasks and computations.Table of contents
Training a Classifier
So, let’s get started!Introduction
This guide is to get you started with the mechanics of Flux, to start building models right away. While this is loosely based on a tutorial by Pytorch, it will cover all the areas necessary. It introduces basic Julia programming, as well Zygote, a source-to-source automatic differentiation (AD) framework in Julia. Using all these tools, we will build a simple neural network and in the end a CNN which we will train to classify between 10 classes.What is Flux in Julia?
Flux is an open-source machine-learning software library written completely in Julia. A stable release which we will be using is v0.12.4. As we would have expected, it has a layer-on layer stacking-based interface for simple models with strong support on interoperability with other packages of Julia, instead of having a monolithic design. For example, if we need GPU support we can get it directly via the implementation of CuArrays. This is in complete contrast to other frameworks in Julia which are implemented in different languages but bound with Julia such as Tensorflow (Julia Package) and thus are more or less limited by the functionality present in their implementation.Installation of Julia
Before we move further, if you don’t have Julia installed in your system, it can be from its official site chúng tôi .
To use Julia in Jupyter notebook like Python, we only need to add the IJulia package as follows and we can run Julia right from the jupyter notebook.using Pkg Pkg.add("IJulia")
We can use Julia as we used Python in Jupyter notebook for exploratory data analysis.Arrays in Julia
Before moving on to the framework, we need to understand the basics of a deep learning framework. Arrays, CudaArrays, etc. In this section, I’ll explain the basics of the array in Julia.
with just three elements.x = [10,12,16]
Here’s a matrix – a square array with four elements.x = [10 12; 13 14]
elements, each a random number ranging from 0 to 1.x = rand(12, 2)
rand is not just the only function that can create a random matrix (array) we can use different functions like ones, zeros, or randn. Try them out in the jupyter notebook to see what they do.
By default, Julia stores all the numbers in a high-precision format called Float64. In Machine Learning we often don’t need all those many digits, so we can configure Julia to decrease it to Float32, or if we need higher precision than 64 bits we can use BigFloat. Below is an example of a random matrix of 6×3 = 18 elements of BigFloat.x = rand(BigFloat, 6, 3) x = rand(Float32, 6, 3)
To count the number of elements in a matrix we can use the length function.length(x)
Or, if we need the size we can check it more explicitly.size(x)
We can do many sorts of algebraic operations on matrix, for example, we can add two matricesx + x
Or subtract them.x - x
Julia supports a feature called broadcasting, using the “.” syntax. The broadcast() is an inbuilt function in julia that is used to broadcast or apply the function f over the collections, arrays, or tuples. This makes it easy to apply a function to one or more arrays with a concise dot. syntax. For example – f.(a, b) means “apply f elementwise to a and b”. We can use this broadcasting in our matrix to add 1 element-wise in x.x .+ 1
Finally, we have to use Matrix Multiplication more or less every time we use Machine Learning. Is super-easy to use with Julia.W = randn(4, 10) x = rand(10) W * x CUDA Arrays in Julia
CUDA functionality is provided separately by the CUDA package from Julia. If you have a GPU and CUDA available, you can run ] add CUDA in IJulia (jupyter notebook) to get it. Once you get the CUDA installed (compatible versions below julia 1.6) we can transfer our arrays into CUDA arrays (or in GPU) using cu function. It supports all the basic functionalities of an array but now works on GPU.
GPU hardware. In this section, I will briefly demonstrate the use of the CuArray type. Since we are exposing CUDA’s functionality by implementing existing Julia interfaces on the CuArray type, we should refer to the upstream Julia documentation for more in-depth information on these operations.import Pkg Pkg.add("CUDA")
Import the library and covert matrix into CudaArrays. Now, these CuArrays will run on GPU which by default is much faster than Arrays and we barely had to do anything in it.using CUDA x = cu(rand(6, 3)) Flux.jl in Julia
Flux is a library or package in Julia specifically for machine learning. It comes with a vast range of functionalities that help us harness the full potential of Julia, without getting our hands messy (like auto-differentiation). We follow a few key principles in Flux.jl
and will be faster.
You could have written Flux from scratch – From LSTMs to GPU Kernels, it is a very straightforward Julia code. Whenever in doubt, one can always look to the documentation. If you need something different, you can easily write your own code in it.
Integrates nicely with others – Flux works well with Julia libraries from Data Frames to Images (Images package) and even differential equations solver (another package in Julia for computation), so you can easily build complex data processing pipelines that integrate Flux models.
You can add Flux from using Julia’s package manager, by typing ] add Flux in the Julia prompt or useimport Pkg Pkg.add("Flux") Automatic Differentiation
Automatic differentiation (AD), also called algorithmic differentiation or simply “auto diff”, is used to calculate differentiation of functions. It is a family of techniques similar to backpropagation for efficiently evaluating derivatives of numeric functions expressed as a form of computer programs.
One probably has learned to differentiate functions in calculus classes but let’s recap it in Julia code.f(x) = 4x^2 + 12x + 3 f(4)
In simpler cases like these, we can easily find the gradient by hand, for example in this it is 8x + 12. But it’s much faster and efficient to make the Flux do it for us!using Flux: gradient df(x) = gradient(f, x) df(4)
We can cross-check with few more inputs, to see if the gradient calculated by Flux is correct and is indeed 8x+12. We can do it multiple times and since the function we took was the C_2 function second derivative is just an integer 8.ddf(x) = gradient(df, x) ddf(4)
As long as the mathematical functions we create in Julia are differentiable we can use auto differentiation of Flux to handle any code we throw at it, which includes recursion, loops, and even custom layers. For example, we can try to differentiate the Taylor series approximation of sin function.mysin(x) = sum((-1)^k*x^(1+2k)/factorial(1+2k) for k in 0:6) x = 0.6 mysin(x), gradient(mysin, x) sin(x), cos(x)
As we expected the derivative is numerically very close to the function cos(x) (which is sinx derivative).
What if instead of just taking a single number as input, we take arrays as inputs? This gets more interesting as we proceed further. Let’s take an example where we have a function that takes a matrix and two vectors.myloss( W , b , x ) = sum(W * x .+ b) #calculating loss W = randn(3, 5) b = zeros(3) x = rand(5) gradient(myloss, W, b, x)
Now we get gradients for each of the inputs W, b, and x, and these will come in very handy when we have to train our model. Since we know that machine learning models can contain hundreds or thousands of parameters, Flux here provides a slightly different method of writing gradient. Just like other deep learning frameworks, we mark our arrays with params to indicate that we want its gradients. W and b represent the weight and bias respectively.using Flux: params W = randn(3, 5) b = zeros(3) x = rand(5) y(x) = sum(W * x .+ b)
Using those parameters we can now get the gradients of W and b directly. It’s especially useful when we are working with layers. Think of the layer as a container for parameters. For example, the Dense function from Flux does familiar linear transform.using Flux m = Dense(10, 5) x = rand(Float32, 10)
To get parameters of any layer or model we can always simply use params from Flux.params(m)
So even if our network has many many parameters we can easily calculate their gradient for all parameters.x = rand(Float32, 10) #ran array m = Chain(Dense(10, 5, relu), Dense(5, 2), softmax) #creating a layer l(x) = sum(Flux.crossentropy(m(x), [0.5, 0.5])) #loss function l(x)
We don’t explicitly have to use layers but sometimes they can be very convenient for many simple kinds of models and faster iterations.
The next step would be to update the weights of the network and perform optimization using different algorithms. The first optimization algorithm which comes to mind is Gradient Descent because of its simplicity. We take the weights and steps using a learning rate which is hyper-param and the gradients. weights = weights – learning_rate x gradient.using Flux.Optimise: update!, Descent η = 0.1 #learning rate for p in params(m) end
While the method we used above to update the param in place using gradients is valid, it can get way more complicated as the algorithms we use gets more involved in it. Here, Flux comes to the rescue with its prebuilt set of optimizers which makes our work way too easy. All we need to do is give the algorithm a learning rate and that’s it.opt = Descent(0.01)
So training a new network finally reduces down to iteration on the given dataset multiple times (epochs) and performing all the steps in order (given below in code). For the sake of simplicity and clarity, we do a quick implementation in Julia, let’s train a network that learns to predict 0.5 for every input of 10 floats. Flux has a function called train! to do all this for us.data, labels = rand(10, 100), fill(0.5, 2, 100) #dataset loss(x, y) = sum(Flux.crossentropy(m(x), y)) #creating loss function Flux.train!(loss, params(m), [(data,labels)], opt) #training the model
You don’t have to use the train! In cases where arbitrary logic might be better suited, you could open up this training loop like so:for d in training_set #assuming d looks like ( data, labels) # our logic here gs = gradient( params( m ) ) do # m is our model l = loss(d...) end update!( opt, params(m), gs) end
And this concludes the basics of Flux usage, in the next section, we will learn to implement it to train a classifier for the CIFAR10 dataset.Training a Classifier for the Deep Learning Model
Getting a real classifier to work might help fix the workflow in Julia a bit more. CIFAR10 is a dataset of 50k tiny training images split into 10 classes of dogs, birds, deer, etc. The reader is requested to check the image below for more details.
We will do the following steps in order to get a classifier trained –
Load the dataset of CIFAR10 (both training and test dataset)
Create a Convolution Neural Network (CNN)
Define a loss function to calculate losses
Use training data to train our network
Evaluate our model on the test dataset
Useful Libraries to install before we proceed, installation is simple but might take few minutes to completely install.] add Metalhead #to get the data ] add Images #Image processing package ] add ImageIO #to output images
Loading the Dataset
Metalhead.jl (Package) is an excellent package that has tons of classic predefined and pre-trained CV (computer vision) models. It also consists of a variety of data loaders that come in handy during the dataset load process.using Statistics using Flux, Flux.Optimise #deep learning framework using Metalhead, Images #to load dataset using Metalhead: trainimgs using Images.ImageCore #to work on image processing using Flux: onehotbatch, onecold #to encode using Base.Iterators: partition using CUDA #for GPU functionality
This image will give us an idea of the different types of labels we are dealing with.Metalhead.download(CIFAR10) #download the dataset CIFAR10 X = trainimgs(CIFAR10) #take the training dataset as X labels = onehotbatch([X[i].ground_truth.class for i in 1:50000],1:10) #encode the dataset
To get more information about what we are dealing with let’s take a look at a random image from the dataset.image(x) = chúng tôi # handy for use later ground_truth(x) = x.ground_truth image.(X[rand(1:end, 10)]) #to show the images in IJulia itself
With 3 RGB layers of the matrix (32x32x3), together create the image vector we see above. Now since the dataset is too large, we can pass them in batches (take 1000) and keep a set for validation to check the evaluation of our model. This process of passing them in batches is called mini-batch learning and is very popular in machine learning. So, in layman terms, rather than sending our entire dataset which is big and might not fit in RAM, we break the dataset into small packets (mini-batches), usually chosen randomly, and then train our model on it. It is observed that they help with escaping the saddle points (it is the minimax point on the surface of the curve).
First, we define a ‘getarray’ function that would help in converting the matrices to Float type.getarray(X) = float.( permutedims( channelview( X ), (2, 3, 1))) #get the matrix to float type imgs = [ getarray(X[i].img ) for i in 1:50000] #get all the matrices into float
In our batch of 1000, the first 49,000 images will make our training set and the rest will be saved for validation or test set. To achieve this we can use the function called ‘partition’ which handily breaks down the set we give it in consecutive parts (1000). and to concatenate we use use ‘cat’ function along any dimension.
valset = 49001:50000
Defining the Classifier
Now comes the part where we can define our Convolutional Neural Network (CNN).
Definition of a convolutional neural network is – one that defines a kernel and slides it across a matrix to create an intermediate representation to extract features from. As it goes into deeper layers it creates higher-order features which make it suitable for images (although it can be used in plenty of other situations), where the structure of the subject is what will help us determine which class it belongs to.m = Chain( #crearting a CNN MaxPool((2,2)), #first layer of CNN MaxPool((2, 2)), #second layer of CNN Dense(200, 120), #first layer Dense(120, 84), #second layer Dense(84, 10), #third and final layer with 10 classification labels.
Whenever we have to work with data that has multiple independent classes, cross-entropy comes in handy. And for the momentum, as the name suggests, it gradually lowers the learning rate as we proceed further with the training. This is necessary in case we overshoot from the desired destination and the chances for local minima increase while helping us to maintain a bit of adaptivity in our optimization.using Flux: crossentropy, Momentum #import the optimizers loss(x, y) = sum(crossentropy(m(x), y)) #using loss function opt = Momentum(0.01) #fixing the momentum
Before starting our training loop, we will need some sort of basic accuracy numbers about our model to keep the track of our progress. We can design our custom function to achieve just the same.accuracy(x, y) = mean( onecold(m (x), 1:10) .== onecold(y, 1:10))
Training the Classifier
This is the part where we finally stitch everything together, here we do all the interesting operations which we defined previously to see what our model is capable of doing. Just for the tutorial, we will only be using 10 iterations over dataset (epochs) and optimize it, although for greater accuracy you can increase the epochs and play with hyperparameters a bit.epochs = 10 #number of iterations for epoch = 1:epochs for d in train gs = gradient(params(m)) do l = loss(d...) #calculate losses end update!(opt, params(m), gs) #upadate the params weights end @show accuracy(valX, valY) #show the accuracy of model after each epoch end
Step by step training process gives us a brief idea of how the network was learning the function. This accuracy is not bad at all for a model which was small and had no hyperparameter tuned with smaller epochs.
Training on a GPU
Testing the Network
As we have trained our neural network for 100 passes over the training dataset. But we would need to check if our model has learned anything at all. To check this, we simply predict the labels corresponding to each class from our neural net output, and checking it against the true values of class labels. If the prediction is correct, we add that sample to the correct prediction (true values) list. This will be done on the still unseen part of the data.
Firstly, we would have to get the same processing of images as we did on the training data set to compare them side by side.valset = valimgs(CIFAR10) #value set valimg = [ getarray(valset[i].img) for i in 1:10000] #get them to array labels = onehotbatch([valset[i].ground_truth.class for i in 1:10000],1:10)#encode them test = gpu.( [(cat(valimg[i]..., dims = 4), labels[:,i]) for i in partition(1:10000, 1000)])
Next, we display some of the images from our validation dataset.ids = rand(1:10000, 10) #random image ids image.(valset[ids]) #show images in vector form
We have 10 values as the output for all 10 classes. If the particular value is higher for a class, our network thinks that image is from that particular class. The below image shows the values (energies) in 10 floats and every column corresponds to the output of one image.
Let’s see how our model fared on the dataset.rand_test = getarray.( image.(valset[ids])) #get the test images rand_truth = ground_truth.(valset[ids]) #check the values against true values m(rand_test)
This looks very similar to how we would have expected our results to be. Even after the small training period, let’s see how our model actually performs on any new data given, (that was prepared by us).accuracy( test...)#testing accuracy
49% is clearly much better than the chances of randomly having it correct which is 10% (since we have 10 classes) which is not bad at all for the small hand-coded models without hyper-parameter tuning like ours.
Let’s take a look at how the net performed on all the classes performed individually.class_correct = zeros(10) #creating an array of zeros class_total = zeros(10) for i in 1:10 preds = m(test[i]) #prediction after feeding it in our model lab = test[i] for j = 1:1000 pred_class = findmax(preds[:, j]) #find the argmax for each class actual_class = findmax(lab[:, j]) #true vale of class if pred_class == actual_class #if both are equal then then increment values by 1 class_correct[pred_class] += 1 end class_total[actual_class] += 1 end end class_correct ./ class_total #getting total number of ratios (/100) times we get it correct
The spread seems pretty good, but some classes are performing significantly better than others. It is left for the reader to explore the reason.Conclusion
In this article, we learned how powerful Julia is when it comes to computation. We learned about the Flux package and how to use it to train our hand-written model to classify between 10 different classes in just a few lines of code, that too on GPU!. We also learned about CuArrays and their significance in decreasing computation time. Hope this article has been helpful in starting your journey with Flux (Julia).
Thanks to the Mike Innes, Andrew Dinhobl, Ygor Canalli et al. for valuable documentation. Reach out to me via LinkedIn (Nihal Singh).
You're reading Introduction To Deep Learning In Julia
This article was published as a part of the Data Science Blogathon.
OpenCV is a massive open-source library for various fields like computer vision, machine learning, image processing and plays a critical function in real-time operations, which are fundamental in today’s systems. It is deployed for the detection of items, faces, Diseases, lesions, Number plates, and even handwriting in various images and videos. With help of OpenCV in Deep Learning, we deploy vector space and execute mathematical operations on these features to identify visual patterns and their varTable of Contents
What is Computer Vision?
Installing and Importing the OpenCV Image Preprocessing Package
Reading an Input Image
Image Data Type
Image Pixel Values
Viewing the Images
Image Operations Using OpenCV and Python
Functionality of OpenCV
SummaryWhat is Computer Vision?
Computer vision is an approach to understanding how photos and movies are stored, as well as manipulating and extracting information from them. Artificial Intelligence depends on or is mostly based on computer vision. Self-driving cars, robotics, and picture editing apps all rely heavily on computer vision
Human vision has a resemblance to that of computer vision. Human vision learns from the various life experiences and deploys them to distinguish objects and interpret the distance between various objects and estimate the relative position.
Source: Analytics Vidhya
With cameras, data, and algorithms, computer vision trains machines to accomplish these jobs in much less time.
Computer vision allows computers and systems to extract useful data from digital images and video inputs.
Installing and Importing the OpenCV Image Preprocessing Package
OpenCV in deep learning is an extremely important important aspect of many Machine Learning algorithms. OpenCV is an open-source library (package) for computer vision, machine learning, and image processing applications that run on the CPU exclusively. It works with many different programming languages, including Python. It can be imported with single line command as being depicted belowpip install opencv-python
A package in Python is a collection of modules that contain pre-written programmes. These packages allow you to import modules separately or in their whole. Importing the package is as simple as calling the “cv2” module as seen below:import cv2 as cv
Reading an Input Image
Colour photographs, grayscale photographs, binary photographs, and multispectral photographs are all examples of digital images. In a colour image, each pixel contains its colour information. Binary images have only two colours, usually black and white pixels, and grayscale images have only shades of grey as their only colour. Multispectral pictures gather image data spanning the electromagnetic spectrum within a specific wavelength.
To read the image, we use the “imread” method from the cv2 package, where the first parameter is the image’s path, including filename and extension, and the second parameter is a flag that determines how to read in the image.
The features of a picture that is being utilised as an inputimport cv2 # To read image cv2.imread function, img = cv2.imread("pythonlogo.png", cv2.IMREAD_COLOR) # Creating GUI window to display an image on screen cv2.imshow("Cute Kitens", img)
Output:Image Data Type
To discover the image’s type, use the “dtype” technique. This strategy enables us to comprehend the representation of visual data and the pixel value.
in addition to the image kind, It’s a multidimensional container for things of comparable shape and size.Pixel values for the image
A collection of small samples can be thought of as an image. These samples are referred to as pixels. To have a better understanding of an image, try zooming in as much as possible. Divided into several squares, the same can be seen. These are pixels, and when all of them are combined, they form an image. One of the simplest methods to represent an image is via a matrix.
Code:print("The data type of the image is",image.dtype) Output: The data type of the image is uint8 uint8 is representing each pixel value being an Unsigned Integer of 8 bits. This data type ranges between 0 to 255 Image Resolution
Image resolution is defined as the number of pixels in an image. As the number of pixels rises, the image quality improves. As we saw before, the image’s shape determines the number of rows and columns. Pixel values in images: 320 x 240 pixels (mostly suitable for small screen devices), 1024 x 768 pixels (appropriate for viewing on standard computer monitors), 720 x 576 pixels (good for viewing on standard definition TV sets with 4:3 aspect ratio), 1280 x 720 pixels (for viewing on widescreen monitors), 1280 x 1024 pixels (for viewing on full-screen monitors) Pixel values in images.Image Pixel Values
A collection of small samples can be thought of as an image. The unit of measurement for these samples is pixels. For improved comprehension, try zooming in on a picture as much as possible. The same can be divided into several different squares. These are pixels that, when combined, make up an image.
The quality of an image decreases as the number of pixels in the image increases. The image’s shape, which we saw earlier, determines the number of rows and columns.Viewing the Images
Let’s have a look at how to make the image appear in a window. We’ll need to create a graphical user interface (GUI) window to display the image on the screen to do so. The title of the GUI window screen must be the first parameter, and it must be specified in string format. The image can be displayed in a pop-up window using the cv2.imshow() method. However, if you try to close it, you can get stuck with its window. We can use the “waitKey” method to mitigate this.
The “waitKey” parameter has been set to ‘0’ to keep the window open until we close it. (You can specify the time in milliseconds instead of 0, indicating how long it should be open for.)# To read image from disk, we use # cv2.imread function, in below method, img = cv2.imread("python logo.png", cv2.IMREAD_COLOR) # Creating GUI window to display an image on screen # first Parameter is windows title (should be in string format) # Second Parameter is image array cv2.imshow("The Logo", img) # To hold the window on screen, we use cv2.waitKey method, If 0 pass an parameter, then it will # hold the screen until user close it. cv2.waitKey(0) # for removing/deleting created GUI window from screen # and memory cv2.destroyAllWindows()
Output: GUI Window, Source: Author
Reconstructing the image bit planes after extracting the image bit planes
An image can be divided into several levels of bit planes. Divide an image into 8-bit (0-7) planes, with the last few planes containing the majority of the image’s data.
Image Operations Using OpenCV and Python
Checking Properties of the Input Image
import cv2import numpy as np import matplotlib.pyplot as plt img = plt.imread("my pic.jpg") plt.imshow(img) print(img.shape) print(img.size) print(img.dtype)
Output:(1921, 1921, 3) 11070723 uint8
Basic Image Processing
import numpy as np
image = cv2.imread(“baby yoda.jpg”)
#cv2.imshow(‘Example – Show image in window’,image)
img2 = cv2.cvtColor(image,cv2.COLOR_BGR2RGB)
Source: AuthorDilation and Erosion of the Input Image
Input Image:import cv2 import numpy as np import matplotlib.pyplot as plt img = plt.imread("baby yoda.jpg") # Taking a matrix of size 5 as the kernel kernel = np.ones((5,5), np.uint8) # first parameter is basicaly the original image, # kernel is the matrix with which image is convolved # and third parameter is the number of iterations, which will determine how much # you want to erode/dilate a given image. img_erosion = cv2.erode(img, kernel, iterations=1) img_dilation = cv2.dilate(img, kernel, iterations=1) plt.imshow(img) plt.imshow(img_erosion) plt.imshow(img_dilation)
Source: Author, Image after erosion effect
Source: Author, Image after dilation effectOpenCV Applications
• The concept of OpenCV in Deep Learning is applied for recognition of faces.
• Counting the number of people (foot traffic in a mall, for example)
• Counting the number of automobiles on motorways and their speeds
• Interaction-based art installations
• Anomalies (defects) are detected during the production process (the odd defective products)
• Stitching an image from a street view
• Street view image stitching
• Video/image search and retrieval
• Robot and autonomous car navigation and control
• object recognition
• Medical image analysis
• Movies – 3D structure from motionFunctionality of OpenCV
• I/O, processing and display of images and videos
• Detection of objects and features
• Computer visi
• Computer-assisted photographySummary
So in this article, we covered the basic Introduction about OpenCV Library and its application in real-time scenarios. We also covered other key terminologies and fields where OpenCV in deep learning is being deployed(Computer Vision) as well as implemented python code for performing some of the basic image operations(dilation, erosion, and changing image colours) with the help of the OpenCV library. Apart from that OpenCV in deep learning would also find application in a variety of industries.
My name is Pranshu Sharma and I am a Data Science Enthusiast
For any feedback Email me at [email protected]
The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.
This article was published as a part of the Data Science Blogathon.INTRODUCTION
Deep Learning is a subset of Machine Learning based on Artificial Neural Networks. The main idea behind Deep Learning is to mimic the working of a human brain. Some of the use cases in Deep Learning involves Face Recognition, Machine Translation, Speech Recognition, etc. Learning can be supervised,semi-supervised, or unsupervised.
What is Neural Style Transfer?
If you are an artist I am sure you must have thought like, What if I can paint like Picasso? Well to answer that question Deep Learning comes with an interesting solution-Neural Style Transfer.
In layman’s terms, Neural Style Transfer is the art of creating style to any content. Content is the layout or the sketch and Style being the painting or the colors. It is an application of Image transformation using Deep Learning.How does it work?
Unsurprisingly there have been quite a few approaches towards NST but we would start with the traditional implementation for basic understanding and then we will explore more!
The base idea on which Neural Style Transfer is proposed is “it is possible to separate the style representation and content representations in a CNN, learned during a computer vision task (e.g. image recognition task).“
I am assuming you must have heard about the ImageNet Competition from where we were introduced to the state of the art models starting from AlexNet then VGG then RESNET and many more. There is something common in all these models is that they are trained on a large ImageNet Dataset (14 million Images with 1000 classes) which makes them understand the ins and out of any image. We leverage this quality of these models by segregating the content and the style part of an image and providing a loss function to optimize the required result.
As stated earlier, we define a pre-trained convolutional model and loss functions which blends two images visually, therefore we would be requiring the following inputs
A Content Image – image on which we will transfer style
A Style Image – the style we want to transfer
An Input Image(generated) – The final content plus the required style imageIMPLEMENTATION Model
Like I said we will be using pre-trained convolutional neural networks. A way to cut short this process is the concept of transfer learning where libraries like keras have provided us with these giants and let us experiment with them on our own problem statements. Here we will be using keras for transfer learning…we can load the model using the following lines of code…
The first two lines involve importing libraries like keras. Then we will load the model using vgg19.VGG19() where include_top = False depicts that we don’t want the final softmax layer which is the output layer used to classify the 1000 classes in the competition.
The fourth line makes a dictionary that will store the key as layer name and value as layer outputs. Then we finally define our model with inputs as VGG input specification and outputs as the dictionary we made for each layer.
Next, we will define the layers from which we will extract our content and style characteristics.
We have already made the dictionary where we can map these layers and extract the outputs.Loss Functions
To get the desired image we will have to define a loss function which will optimize the losses towards the required result. Here we will be using the concept of per pixel losses.
Per Pixel Loss is a metric that is used to understand the differences between images on a pixel level. It compares the output pixel values with the input values. (Another method is perpetual loss functions we will discuss briefly at the later stages of the blog). Sometimes per pixel loss has its own drawbacks in terms of representing every meaningful characteristic. That’s where perpetual losses come into the picture. The loss terms we will be focusing on will be-
Style LossContent Loss
It makes sure the content we want in the generated image is captured efficiently. It has been observed that CNN captures information about the content in the higher levels of the network, whereas the lower levels are more focused on the individual pixel values.
Here the base is the content features while the combination is the generated output image features. Here the reduce_sum computes the sum of elements across the dimensions of the specified parameters which is in this case the difference of corresponding pixels between input(content) and generated image.Style Loss
Defining the loss function for style has more work than content as multiple layers are involved in computing. The style information is measured as the amount of correlation present between the feature maps per layer. Here we use the Gram Matrix for computing style loss. So what is a gram matrix?
Gram matrix is the measure by which we capture the distribution of features over a set of feature maps in a given layer. So while you are basically computing or minimizing the style loss you are making the level of distribution of features the same in both of the styles and generated images.
So the idea is to make gram matrices of style and generated images and then compute the difference between the two. The Gram matrix(Gij) is the multiplication of the ith and jth feature map of a layer and then summed across height and width as shown above.
Now we have computed both the loss functions. Therefore to calculate the final loss we will compute a weighted summation of both the computed content and style losses.
The above code is the final integration of losses by traversing through the layers and computing the final loss by taking a weighted summation in the second last line. Finally, we would have to define an optimizer(Adam or SGD) that would optimize the loss of the network.OUTPUT
There are many other faster proposals of NST which I would like you to explore and come up with faster mechanisms. One concept to follow is that there is a perpetual loss concept using an Image Transformer neural network which increases the speed of NST and it allows you to train your Image transformer neural network per content and apply various styles without retraining.
It is more helpful in deploying environments as the traditional model trains for each pair of content and style while this concept allows one-time content training followed by multiple style transformations on the same content.BRIEF INTRODUCTION TO FAST NST
Training a style transfer model requires two networks: a pre-trained feature extractor and a transfer network. The pre-trained feature extractor is used to avoid having to use paired training data. Its usefulness arises from the curious tendency for individual layers of deep convolutional neural networks trained for image classification to specialize in understanding specific features of an image.
The pre-trained model enables us to compare the content and style of two images, but it doesn’t actually help us create the stylized image. That’s the job of a second neural network, which we’ll call the transfer network. The transfer network is an image translation network that takes one image as input and outputs another image. Transfer networks typically have an encode-decoder architecture.
At the beginning of training, one or more style images are run through the pre-trained feature extractor, and the outputs at various style layers are saved for later comparison. Content images are then fed into the system. Each content image passes through the pre-trained feature extractor, where outputs at various content layers are saved. The content image then passes through the transfer network, which outputs a stylized image. The stylized image is also run through the feature extractor, and outputs at both the content and style layers are saved.
The quality of the stylized image is defined by a custom loss function that has terms for both content and style. The extracted content features of the stylized image are compared to the original content image, while the extracted style features are compared to those from the reference style image(s). After each step, only the transfer network is updated. The weights of the pre-trained feature extractor remain fixed throughout. By weighting the different terms of the loss function, we can train models to produce output images with lighter or heavier stylization.
Congratulations you have learned what a Neural Style Transfer is and how it works. But that is certainly not the end, next comes exploring the topic with more recent research papers, blogs, and faster implementations. For that too you have a kick start. I hope you enjoyed the blog which targeted the basic traditional workflow of a Neural Style Transfer and I hope I was able to induce an intuition towards understanding NST.
Thank you for reading!
Deep learning (DL) could be defined as a form of machine learning based on artificial neural networks which harness multiple processing layers in order to extract progressively better and more high-level insights from data. In essence it is simply a more sophisticated application of artificial intelligence (AI) platforms and machine learning (ML).
Here are some of the top trends in deep learning:
Model Scale Up
A lot of the excitement in deep learning right now is centered around scaling up large, relatively general models (now being called foundation models). They are exhibiting surprising capabilities such as generating novel text, images from text, and video from text. Anything that scales up AI models adds yet more capabilities to deep learning. This is showing up in algorithms that go beyond simplistic responses to multi-faceted answers and actions that dig deeper into data, preferences, and potential actions.
Scale Up Limitations
However, not everyone is convinced that the scaling up of neural networks is going to continue to bear fruit. Roadblocks may lie ahead.
“There is some debate about how far we can get in terms of aspects of intelligence with scaling alone,” said Peter Stone, PhD, Executive Director, Sony AI America.
“Current models are limited in several ways, and some of the community is rushing to point those out. It will be interesting to see what capabilities can be achieved with neural networks alone, and what novel methods will be uncovered for combining neural networks with other AI paradigms.”
AI and Model Training
AI isn’t something you plug in and, presto, instant insights. It takes time for the deep learning platform to analyze data sets, spot patterns, and begin to derive conclusions that have broad applicability in the real world. The good news is that AI platforms are rapidly evolving to keep up with model training demands.
“Organizations can enhance their AI platforms by combining open-source projects and commercial technologies,” said Bin Fan, VP Open Source and Founding Engineer at Alluxio.
“It is essential to consider skills, speed of deployment, the variety of algorithms supported, and the flexibility of the system while making decisions.”
“Containerization being the key, Kubernetes will aid cloud-native MLOps in integrating with more mature technologies,” said Fan.
Prescriptive Modeling over Predictive Modeling
Modeling has gone through many phases over the last many years. Initial attempts tried to predict trends from historical data. This had some value, but didn’t take into account factors such as context, sudden traffic spikes, and shifts in market forces. In particular, real-time data played no real part in early efforts at predictive modeling.
As unstructured data became more important, organizations wanted to mine it to glean insight. Coupled with the rise in processing power, suddenly real time analysis rose to prominence. And the immense amounts of data generated by social media has only added to the need to address real time information.
How does this relate to AI, deep learning, and automation?
“Many of the current and previous industry implementations of AI have relied on the AI to inform a human of some anticipated event, who then has the expert knowledge to know what action to take,” said Frans Cronje, CEO and Co-founder of DataProphet.
“Increasingly, providers are moving to AI that can anticipate a future event and take the correspondent action.”
This opens the door to far more effective deep learning networks. With real time data being constantly used by multi-layered neural networks, AI can be utilized to take more and more of the workload away from humans. Instead of referring the decision to a human expert, deep learning can be used to prescribe predicted decisions based on historical, real-time, and analytical data.
“Memory Error” – that all too familiar dreaded message in Jupyter notebooks when we try to execute a machine learning or deep learning algorithm on a large dataset. Most of us do not have access to unlimited computational power on our machines. And let’s face it, it costs an arm and a leg to get a decent GPU from existing cloud providers. So how do we build large deep learning models without burning a hole in our pockets? Step up – Google Colab!
It’s an incredible online browser-based platform that allows us to train our models on machines for free! Sounds too good to be true, but thanks to Google, we can now work with large datasets, build complex models, and even share our work seamlessly with others. That’s the power of Google Colab.What is Google Colab?
Google Colaboratory is a free online cloud-based Jupyter notebook environment that allows us to train our machine learning and deep learning models on CPUs, GPUs, and TPUs.
Here’s what I truly love about Colab. It does not matter which computer you have, what it’s configuration is, and how ancient it might be. You can still use Google Colab! All you need is a Google account and a web browser. And here’s the cherry on top – you get access to GPUs like Tesla K80 and even a TPU, for free!
TPUs are much more expensive than a GPU, and you can use it for free on Colab. It’s worth repeating again and again – it’s an offering like no other.
Are you are still using that same old Jupyter notebook on your system for training models? Trust me, you’re going to love Google Colab.What is a Notebook in Google Colab? Google Colab Features
Colab provides users free access to GPUs and TPUs, which can significantly speed up the training and inference of machine learning and deep learning models.
Colab’s interface is web-based, so installing any software on your local machine is unnecessary. The interface is also intuitive and user-friendly, making it easy to get started with coding.
Colab allows multiple users to work on the same notebook simultaneously, making collaborating with team members easy. Colab also integrates with other Google services, such as Google Drive and GitHub, making it easy to share your work.
Colab notebooks support markdown, which allows you to include formatted text, equations, and images alongside your code. This makes it easier to document your work and communicate your ideas.
Colab comes pre-installed with many popular libraries and tools for machine learning and deep learning, such as TensorFlow and PyTorch. This saves time and eliminates the need to manually install and configure these tools.GPUs and TPUs on Google Colab
Ask anyone who uses Colab why they love it. The answer is unanimous – the availability of free GPUs and TPUs. Training models, especially deep learning ones, takes numerous hours on a CPU. We’ve all faced this issue on our local machines. GPUs and TPUs, on the other hand, can train these models in a matter of minutes or seconds.
If you still need a reason to work with GPUs, check out this excellent explanation by Faizan Shaikh.
It gives you a decent GPU for free, which you can continuously run for 12 hours. For most data science folks, this is sufficient to meet their computation needs. Especially if you are a beginner, then I would highly recommend you start using Google Colab.
Google Colab gives us three types of runtime for our notebooks:
As I mentioned, Colab gives us 12 hours of continuous execution time. After that, the whole virtual machine is cleared and we have to start again. We can run multiple CPU, GPU, and TPU instances simultaneously, but our resources are shared between these instances.
Let’s take a look at the specifications of different runtimes offered by Google Colab:
It will cost you A LOT to buy a GPU or TPU from the market. Why not save that money and use Google Colab from the comfort of your own machine?How to Use Google Colab?
You can go to Google Colab using this link. This is the screen you’ll get when you open Colab:
You can also import your notebook from Google Drive or GitHub, but they require an authentication process.Google Colab Runtimes – Choosing the GPU or TPU Option
The ability to choose different types of runtimes is what makes Colab so popular and powerful. Here are the steps to change the runtime of your notebook:
Step 2: Here you can change the runtime according to your need:
A wise man once said, “With great power comes great responsibility.” I implore you to shut down your notebook after you have completed your work so that others can use these resources because various users share them. You can terminate your notebook like this:Using Terminal Commands on Google Colab
You can use the Colab cell for running terminal commands. Most of the popular libraries come installed by default on Google Colab. Yes, Python libraries like Pandas, NumPy, scikit-learn are all pre-installed.
If you want to run a different Python library, you can always install it inside your Colab notebook like this:
Pretty easy, right? Everything is similar to how it works in a regular terminal. We just you have to put an exclamation(!) before writing each command like:
!pwdCloning Repositories in Google Colab
You can also clone a Git repo inside Google Colaboratory. Just go to your GitHub repository and copy the clone link of the repository:
Then, simply run:
And there you go!Uploading Files and Datasets
Here’s a must-know aspect for any data scientist. The ability to import your dataset into Colab is the first step in your data analysis journey.
The most basic approach is to upload your dataset to Colab directly:
You can also upload your dataset to any other platform and access it using its link. I tend to go with the second approach more often than not (when feasible).Saving Your Notebook
All the notebooks on Colab are stored on your Google Drive. The best thing about Colab is that your notebook is automatically saved after a certain time period and you don’t lose your progress.
If you want, you can export and save your notebook in both *.py and *.ipynb formats:
Not just that, you can also save a copy of your notebook directly on GitHub, or you can create a GitHub Gist:
I love the variety of options we get.Exporting Data/Files from Google Colab
You can export your files directly to Google Drive, or you can export it to the VM instance and download it by yourself:
Exporting directly to the Drive is a better option when you have bigger files or more than one file. You’ll pick up these nuances as you work on bigger projects in Colab.Sharing Your Notebook
Google Colab also gives us an easy way of sharing our work with others. This is one of the best things about Colab:What’s Next?
Google Colab now also provides a paid platform called Google Colab Pro, priced at $9.99 a month. In this plan, you can get the Tesla T4 or Tesla P100 GPU, and an option of selecting an instance with a high RAM of around 27 GB. Also, your maximum computation time is doubled from 12 hours to 24 hours. How cool is that?
You can consider this plan if you need high computation power because it is still quite cheap when compared to other cloud GPU providers like AWS, Azure, and even GCP.Recommendations
If you’re new to the world of Deep Learning, I have some excellent resources to help you get started in a comprehensive and structured manner:
This article was published as a part of the Data Science Blogathon.Introduction to Deep Learning Algorithms
The goal of deep learning is to create models that have abstract features. This is accomplished by building models composed of many layers in which higher layers interpret the input while lower layers abstract the details.
As we train these deep learning networks, the high-level information from the input image produces weights that determine how information is interpreted.
These weights are generated by stochastic gradient descent algorithms based on backpropagation for updating the network parameters.
Training large neural networks on big data can take days or weeks, and it may require adjustments for optimal performance, such as adding more memory or computing power.
Sometimes it’s necessary to experiment with multiple architectures such as nonlinear activation functions or different regularization techniques like dropout or batch normalization.Nearest Neighbor
Clustering algorithms divide a larger set of input into smaller sets so that those sets can be more easily visualized -Nearest Neighbor is one such algorithm because it breaks the input up based on the distance between data points.
For example, if we had an input set containing pictures of animals and cars, the nearest neighbor would break the inputs into two clusters. The nearest cluster would contain images with similar shapes (i.e., animals or cars), and the furthest cluster would contain images with different shapes.Convolutional Neural Networks (CNN)
Convolutional neural networks are a class of artificial neural networks that employ convolutional layers to extract features from the input. CNNs are frequently used in computer vision because they can process visual data with fewer moving parts, i.e., they’re efficient and run well on computers. In this sense, they fit the problem better than traditional deep learning models. The basic idea is that at each layer, one-dimensionality is dropped out of the input; so for a given pixel, there is a pooling layer for just spatial information, then another for just color channels, then one more for channel-independent filters or higher-level activation functions.Long Short Term Memory Neural Network (LSTMNN)
Several deep learning algorithms can be combined in many different ways to produce models that satisfy certain properties. Today, we will discuss the Long Short-Term Memory Neural Network (LSTMNN). LSTM networks are great for detecting patterns and have been found to work well in NLP tasks, image recognition, classification, etc. The LSTMNN is a neural network that consists of LSTM cells.Recurrent Neural Network ( RNN )
An RNN is an artificial neural network that processes data sequentially. In comparison to other neural networks, RNNs can understand arbitrary sequential data better and are better at predicting sequential patterns. The main issue with RNNs is that they require very large amounts of memory, so many are specialized for a single sequence length. They cannot process input sequences in parallel because the hidden state must be saved across time steps. This is because each time step depends on the previous time step, and future time steps cannot be predicted by looking at only one past time step.Generative Adversarial Networks (GANs) Support Vector Machines (SVM)
One deep learning algorithm is Support Vector Machines (SVM). One of the most famous classification algorithms, SVM, is a numerical technique that uses a set of hyperplanes to separate two or more data classes. In binary classification problems, hyperplanes are generally represented by lines in a two-dimensional plane. Generally, an SVM is trained and used for a particular problem by tuning parameters that govern how much data each support vector will contribute to partitioning the space. The kernel function determines how one feature vector maps into an SVM; it could be linear or nonlinear depending on what is being modeled.Artificial Neural Networks (ANN)
ANNs are networks that are composed of artificial neurons. The ANN is modeled after the human brain, but there are variations. The type of neuron being used and the type of layers in the network determine the behavior.
ANNs typically involve an input layer, one or more hidden layers, and an output layer. These layers can be stacked on top of each other and side by side. When a new piece of data comes into the input layer, it travels through the next layer, which might be a hidden layer where it does computations before going on to another layer until it reaches the output layer.
The decision-making process involves training an ANN with some set parameters to learn what outputs should come from inputs with various conditions.Autoencoders Section: Compositional Pattern Producing Networks (CPPN)
Compositional Pattern Producing Networks (CPPN) is a kind of autoencoder, meaning they’re neural networks designed for dimensionality reduction. As their name suggests, CPPNs create patterns from an input set. The patterns created are not just geometric shapes but very creative and organic-looking forms. CPPN Autoencoders can be used in all fields, including image processing, image analysis, and prediction markets.Conclusion
To summarize, deep learning algorithms are a powerful and complex technology capable of identifying data patterns. They enable us to parse information and recognize trends more efficiently than ever.
Furthermore, they help businesses make more informed decisions with their data. I hope this guide has given you a better understanding of deep learning and why it is important for the future.
There are many deep learning algorithms, but the most popular ones used today are Recurrent Neural Networks (RNN) and Convolutional Neural Networks (CNN).
I would recommend taking some time to learn about these two approaches on your own to decide which one might be best for your situation.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.
Update the detailed information about Introduction To Deep Learning In Julia on the Kientrucdochoi.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!