You are reading the article A Basic Introduction To Opencv In Deep Learning updated in February 2024 on the website Kientrucdochoi.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested March 2024 A Basic Introduction To Opencv In Deep Learning
This article was published as a part of the Data Science Blogathon.
OpenCV is a massive open-source library for various fields like computer vision, machine learning, image processing and plays a critical function in real-time operations, which are fundamental in today’s systems. It is deployed for the detection of items, faces, Diseases, lesions, Number plates, and even handwriting in various images and videos. With help of OpenCV in Deep Learning, we deploy vector space and execute mathematical operations on these features to identify visual patterns and their varTable of Contents
What is Computer Vision?
Installing and Importing the OpenCV Image Preprocessing Package
Reading an Input Image
Image Data Type
Image Pixel Values
Viewing the Images
Image Operations Using OpenCV and Python
Functionality of OpenCV
SummaryWhat is Computer Vision?
Computer vision is an approach to understanding how photos and movies are stored, as well as manipulating and extracting information from them. Artificial Intelligence depends on or is mostly based on computer vision. Self-driving cars, robotics, and picture editing apps all rely heavily on computer vision
Human vision has a resemblance to that of computer vision. Human vision learns from the various life experiences and deploys them to distinguish objects and interpret the distance between various objects and estimate the relative position.
Source: Analytics Vidhya
With cameras, data, and algorithms, computer vision trains machines to accomplish these jobs in much less time.
Computer vision allows computers and systems to extract useful data from digital images and video inputs.
Installing and Importing the OpenCV Image Preprocessing Package
OpenCV in deep learning is an extremely important important aspect of many Machine Learning algorithms. OpenCV is an open-source library (package) for computer vision, machine learning, and image processing applications that run on the CPU exclusively. It works with many different programming languages, including Python. It can be imported with single line command as being depicted belowpip install opencv-python
A package in Python is a collection of modules that contain pre-written programmes. These packages allow you to import modules separately or in their whole. Importing the package is as simple as calling the “cv2” module as seen below:import cv2 as cv
Reading an Input Image
Colour photographs, grayscale photographs, binary photographs, and multispectral photographs are all examples of digital images. In a colour image, each pixel contains its colour information. Binary images have only two colours, usually black and white pixels, and grayscale images have only shades of grey as their only colour. Multispectral pictures gather image data spanning the electromagnetic spectrum within a specific wavelength.
To read the image, we use the “imread” method from the cv2 package, where the first parameter is the image’s path, including filename and extension, and the second parameter is a flag that determines how to read in the image.
The features of a picture that is being utilised as an inputimport cv2 # To read image cv2.imread function, img = cv2.imread("pythonlogo.png", cv2.IMREAD_COLOR) # Creating GUI window to display an image on screen cv2.imshow("Cute Kitens", img)
Output:Image Data Type
To discover the image’s type, use the “dtype” technique. This strategy enables us to comprehend the representation of visual data and the pixel value.
in addition to the image kind, It’s a multidimensional container for things of comparable shape and size.Pixel values for the image
A collection of small samples can be thought of as an image. These samples are referred to as pixels. To have a better understanding of an image, try zooming in as much as possible. Divided into several squares, the same can be seen. These are pixels, and when all of them are combined, they form an image. One of the simplest methods to represent an image is via a matrix.
Code:print("The data type of the image is",image.dtype) Output: The data type of the image is uint8 uint8 is representing each pixel value being an Unsigned Integer of 8 bits. This data type ranges between 0 to 255 Image Resolution
Image resolution is defined as the number of pixels in an image. As the number of pixels rises, the image quality improves. As we saw before, the image’s shape determines the number of rows and columns. Pixel values in images: 320 x 240 pixels (mostly suitable for small screen devices), 1024 x 768 pixels (appropriate for viewing on standard computer monitors), 720 x 576 pixels (good for viewing on standard definition TV sets with 4:3 aspect ratio), 1280 x 720 pixels (for viewing on widescreen monitors), 1280 x 1024 pixels (for viewing on full-screen monitors) Pixel values in images.Image Pixel Values
A collection of small samples can be thought of as an image. The unit of measurement for these samples is pixels. For improved comprehension, try zooming in on a picture as much as possible. The same can be divided into several different squares. These are pixels that, when combined, make up an image.
The quality of an image decreases as the number of pixels in the image increases. The image’s shape, which we saw earlier, determines the number of rows and columns.Viewing the Images
Let’s have a look at how to make the image appear in a window. We’ll need to create a graphical user interface (GUI) window to display the image on the screen to do so. The title of the GUI window screen must be the first parameter, and it must be specified in string format. The image can be displayed in a pop-up window using the cv2.imshow() method. However, if you try to close it, you can get stuck with its window. We can use the “waitKey” method to mitigate this.
The “waitKey” parameter has been set to ‘0’ to keep the window open until we close it. (You can specify the time in milliseconds instead of 0, indicating how long it should be open for.)# To read image from disk, we use # cv2.imread function, in below method, img = cv2.imread("python logo.png", cv2.IMREAD_COLOR) # Creating GUI window to display an image on screen # first Parameter is windows title (should be in string format) # Second Parameter is image array cv2.imshow("The Logo", img) # To hold the window on screen, we use cv2.waitKey method, If 0 pass an parameter, then it will # hold the screen until user close it. cv2.waitKey(0) # for removing/deleting created GUI window from screen # and memory cv2.destroyAllWindows()
Output: GUI Window, Source: Author
Reconstructing the image bit planes after extracting the image bit planes
An image can be divided into several levels of bit planes. Divide an image into 8-bit (0-7) planes, with the last few planes containing the majority of the image’s data.
Image Operations Using OpenCV and Python
Checking Properties of the Input Image
import cv2import numpy as np import matplotlib.pyplot as plt img = plt.imread("my pic.jpg") plt.imshow(img) print(img.shape) print(img.size) print(img.dtype)
Output:(1921, 1921, 3) 11070723 uint8
Basic Image Processing
import numpy as np
image = cv2.imread(“baby yoda.jpg”)
#cv2.imshow(‘Example – Show image in window’,image)
img2 = cv2.cvtColor(image,cv2.COLOR_BGR2RGB)
Source: AuthorDilation and Erosion of the Input Image
Input Image:import cv2 import numpy as np import matplotlib.pyplot as plt img = plt.imread("baby yoda.jpg") # Taking a matrix of size 5 as the kernel kernel = np.ones((5,5), np.uint8) # first parameter is basicaly the original image, # kernel is the matrix with which image is convolved # and third parameter is the number of iterations, which will determine how much # you want to erode/dilate a given image. img_erosion = cv2.erode(img, kernel, iterations=1) img_dilation = cv2.dilate(img, kernel, iterations=1) plt.imshow(img) plt.imshow(img_erosion) plt.imshow(img_dilation)
Source: Author, Image after erosion effect
Source: Author, Image after dilation effectOpenCV Applications
• The concept of OpenCV in Deep Learning is applied for recognition of faces.
• Counting the number of people (foot traffic in a mall, for example)
• Counting the number of automobiles on motorways and their speeds
• Interaction-based art installations
• Anomalies (defects) are detected during the production process (the odd defective products)
• Stitching an image from a street view
• Street view image stitching
• Video/image search and retrieval
• Robot and autonomous car navigation and control
• object recognition
• Medical image analysis
• Movies – 3D structure from motionFunctionality of OpenCV
• I/O, processing and display of images and videos
• Detection of objects and features
• Computer visi
• Computer-assisted photographySummary
So in this article, we covered the basic Introduction about OpenCV Library and its application in real-time scenarios. We also covered other key terminologies and fields where OpenCV in deep learning is being deployed(Computer Vision) as well as implemented python code for performing some of the basic image operations(dilation, erosion, and changing image colours) with the help of the OpenCV library. Apart from that OpenCV in deep learning would also find application in a variety of industries.
My name is Pranshu Sharma and I am a Data Science Enthusiast
For any feedback Email me at [email protected]
The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.
You're reading A Basic Introduction To Opencv In Deep Learning
This article was published as a part of the Data Science BlogathonOverview
In the current scenario, the Data science field is dominated by Python/R but there is another competition added not so long ago, Julia! which we will be exploring in this guide. The famous quote (motto) of Julia is –
Looks like Python, runs like C
We know that python is used for a wide range of tasks. Julia, on the other hand, was primarily developed to perform scientific computation, machine learning, and statistical tasks.
Since Julia was explicitly made for high-level statistical work and scientific computations, it has several benefits over Python. In linear algebra, for example, “vanilla” (raw) Julia performs better than “vanilla” (raw) Python. This is primarily because, unlike Julia, Python doesn’t support all equations and the matrices used in machine learning.
While Python is a great language with its library Numpy, Julia completely outperforms it when it comes to non-package experience, with Julia being more catered towards machine learning tasks and computations.Table of contents
Training a Classifier
So, let’s get started!Introduction
This guide is to get you started with the mechanics of Flux, to start building models right away. While this is loosely based on a tutorial by Pytorch, it will cover all the areas necessary. It introduces basic Julia programming, as well Zygote, a source-to-source automatic differentiation (AD) framework in Julia. Using all these tools, we will build a simple neural network and in the end a CNN which we will train to classify between 10 classes.What is Flux in Julia?
Flux is an open-source machine-learning software library written completely in Julia. A stable release which we will be using is v0.12.4. As we would have expected, it has a layer-on layer stacking-based interface for simple models with strong support on interoperability with other packages of Julia, instead of having a monolithic design. For example, if we need GPU support we can get it directly via the implementation of CuArrays. This is in complete contrast to other frameworks in Julia which are implemented in different languages but bound with Julia such as Tensorflow (Julia Package) and thus are more or less limited by the functionality present in their implementation.Installation of Julia
Before we move further, if you don’t have Julia installed in your system, it can be from its official site chúng tôi .
To use Julia in Jupyter notebook like Python, we only need to add the IJulia package as follows and we can run Julia right from the jupyter notebook.using Pkg Pkg.add("IJulia")
We can use Julia as we used Python in Jupyter notebook for exploratory data analysis.Arrays in Julia
Before moving on to the framework, we need to understand the basics of a deep learning framework. Arrays, CudaArrays, etc. In this section, I’ll explain the basics of the array in Julia.
with just three elements.x = [10,12,16]
Here’s a matrix – a square array with four elements.x = [10 12; 13 14]
elements, each a random number ranging from 0 to 1.x = rand(12, 2)
rand is not just the only function that can create a random matrix (array) we can use different functions like ones, zeros, or randn. Try them out in the jupyter notebook to see what they do.
By default, Julia stores all the numbers in a high-precision format called Float64. In Machine Learning we often don’t need all those many digits, so we can configure Julia to decrease it to Float32, or if we need higher precision than 64 bits we can use BigFloat. Below is an example of a random matrix of 6×3 = 18 elements of BigFloat.x = rand(BigFloat, 6, 3) x = rand(Float32, 6, 3)
To count the number of elements in a matrix we can use the length function.length(x)
Or, if we need the size we can check it more explicitly.size(x)
We can do many sorts of algebraic operations on matrix, for example, we can add two matricesx + x
Or subtract them.x - x
Julia supports a feature called broadcasting, using the “.” syntax. The broadcast() is an inbuilt function in julia that is used to broadcast or apply the function f over the collections, arrays, or tuples. This makes it easy to apply a function to one or more arrays with a concise dot. syntax. For example – f.(a, b) means “apply f elementwise to a and b”. We can use this broadcasting in our matrix to add 1 element-wise in x.x .+ 1
Finally, we have to use Matrix Multiplication more or less every time we use Machine Learning. Is super-easy to use with Julia.W = randn(4, 10) x = rand(10) W * x CUDA Arrays in Julia
CUDA functionality is provided separately by the CUDA package from Julia. If you have a GPU and CUDA available, you can run ] add CUDA in IJulia (jupyter notebook) to get it. Once you get the CUDA installed (compatible versions below julia 1.6) we can transfer our arrays into CUDA arrays (or in GPU) using cu function. It supports all the basic functionalities of an array but now works on GPU.
GPU hardware. In this section, I will briefly demonstrate the use of the CuArray type. Since we are exposing CUDA’s functionality by implementing existing Julia interfaces on the CuArray type, we should refer to the upstream Julia documentation for more in-depth information on these operations.import Pkg Pkg.add("CUDA")
Import the library and covert matrix into CudaArrays. Now, these CuArrays will run on GPU which by default is much faster than Arrays and we barely had to do anything in it.using CUDA x = cu(rand(6, 3)) Flux.jl in Julia
Flux is a library or package in Julia specifically for machine learning. It comes with a vast range of functionalities that help us harness the full potential of Julia, without getting our hands messy (like auto-differentiation). We follow a few key principles in Flux.jl
and will be faster.
You could have written Flux from scratch – From LSTMs to GPU Kernels, it is a very straightforward Julia code. Whenever in doubt, one can always look to the documentation. If you need something different, you can easily write your own code in it.
Integrates nicely with others – Flux works well with Julia libraries from Data Frames to Images (Images package) and even differential equations solver (another package in Julia for computation), so you can easily build complex data processing pipelines that integrate Flux models.
You can add Flux from using Julia’s package manager, by typing ] add Flux in the Julia prompt or useimport Pkg Pkg.add("Flux") Automatic Differentiation
Automatic differentiation (AD), also called algorithmic differentiation or simply “auto diff”, is used to calculate differentiation of functions. It is a family of techniques similar to backpropagation for efficiently evaluating derivatives of numeric functions expressed as a form of computer programs.
One probably has learned to differentiate functions in calculus classes but let’s recap it in Julia code.f(x) = 4x^2 + 12x + 3 f(4)
In simpler cases like these, we can easily find the gradient by hand, for example in this it is 8x + 12. But it’s much faster and efficient to make the Flux do it for us!using Flux: gradient df(x) = gradient(f, x) df(4)
We can cross-check with few more inputs, to see if the gradient calculated by Flux is correct and is indeed 8x+12. We can do it multiple times and since the function we took was the C_2 function second derivative is just an integer 8.ddf(x) = gradient(df, x) ddf(4)
As long as the mathematical functions we create in Julia are differentiable we can use auto differentiation of Flux to handle any code we throw at it, which includes recursion, loops, and even custom layers. For example, we can try to differentiate the Taylor series approximation of sin function.mysin(x) = sum((-1)^k*x^(1+2k)/factorial(1+2k) for k in 0:6) x = 0.6 mysin(x), gradient(mysin, x) sin(x), cos(x)
As we expected the derivative is numerically very close to the function cos(x) (which is sinx derivative).
What if instead of just taking a single number as input, we take arrays as inputs? This gets more interesting as we proceed further. Let’s take an example where we have a function that takes a matrix and two vectors.myloss( W , b , x ) = sum(W * x .+ b) #calculating loss W = randn(3, 5) b = zeros(3) x = rand(5) gradient(myloss, W, b, x)
Now we get gradients for each of the inputs W, b, and x, and these will come in very handy when we have to train our model. Since we know that machine learning models can contain hundreds or thousands of parameters, Flux here provides a slightly different method of writing gradient. Just like other deep learning frameworks, we mark our arrays with params to indicate that we want its gradients. W and b represent the weight and bias respectively.using Flux: params W = randn(3, 5) b = zeros(3) x = rand(5) y(x) = sum(W * x .+ b)
Using those parameters we can now get the gradients of W and b directly. It’s especially useful when we are working with layers. Think of the layer as a container for parameters. For example, the Dense function from Flux does familiar linear transform.using Flux m = Dense(10, 5) x = rand(Float32, 10)
To get parameters of any layer or model we can always simply use params from Flux.params(m)
So even if our network has many many parameters we can easily calculate their gradient for all parameters.x = rand(Float32, 10) #ran array m = Chain(Dense(10, 5, relu), Dense(5, 2), softmax) #creating a layer l(x) = sum(Flux.crossentropy(m(x), [0.5, 0.5])) #loss function l(x)
We don’t explicitly have to use layers but sometimes they can be very convenient for many simple kinds of models and faster iterations.
The next step would be to update the weights of the network and perform optimization using different algorithms. The first optimization algorithm which comes to mind is Gradient Descent because of its simplicity. We take the weights and steps using a learning rate which is hyper-param and the gradients. weights = weights – learning_rate x gradient.using Flux.Optimise: update!, Descent η = 0.1 #learning rate for p in params(m) end
While the method we used above to update the param in place using gradients is valid, it can get way more complicated as the algorithms we use gets more involved in it. Here, Flux comes to the rescue with its prebuilt set of optimizers which makes our work way too easy. All we need to do is give the algorithm a learning rate and that’s it.opt = Descent(0.01)
So training a new network finally reduces down to iteration on the given dataset multiple times (epochs) and performing all the steps in order (given below in code). For the sake of simplicity and clarity, we do a quick implementation in Julia, let’s train a network that learns to predict 0.5 for every input of 10 floats. Flux has a function called train! to do all this for us.data, labels = rand(10, 100), fill(0.5, 2, 100) #dataset loss(x, y) = sum(Flux.crossentropy(m(x), y)) #creating loss function Flux.train!(loss, params(m), [(data,labels)], opt) #training the model
You don’t have to use the train! In cases where arbitrary logic might be better suited, you could open up this training loop like so:for d in training_set #assuming d looks like ( data, labels) # our logic here gs = gradient( params( m ) ) do # m is our model l = loss(d...) end update!( opt, params(m), gs) end
And this concludes the basics of Flux usage, in the next section, we will learn to implement it to train a classifier for the CIFAR10 dataset.Training a Classifier for the Deep Learning Model
Getting a real classifier to work might help fix the workflow in Julia a bit more. CIFAR10 is a dataset of 50k tiny training images split into 10 classes of dogs, birds, deer, etc. The reader is requested to check the image below for more details.
We will do the following steps in order to get a classifier trained –
Load the dataset of CIFAR10 (both training and test dataset)
Create a Convolution Neural Network (CNN)
Define a loss function to calculate losses
Use training data to train our network
Evaluate our model on the test dataset
Useful Libraries to install before we proceed, installation is simple but might take few minutes to completely install.] add Metalhead #to get the data ] add Images #Image processing package ] add ImageIO #to output images
Loading the Dataset
Metalhead.jl (Package) is an excellent package that has tons of classic predefined and pre-trained CV (computer vision) models. It also consists of a variety of data loaders that come in handy during the dataset load process.using Statistics using Flux, Flux.Optimise #deep learning framework using Metalhead, Images #to load dataset using Metalhead: trainimgs using Images.ImageCore #to work on image processing using Flux: onehotbatch, onecold #to encode using Base.Iterators: partition using CUDA #for GPU functionality
This image will give us an idea of the different types of labels we are dealing with.Metalhead.download(CIFAR10) #download the dataset CIFAR10 X = trainimgs(CIFAR10) #take the training dataset as X labels = onehotbatch([X[i].ground_truth.class for i in 1:50000],1:10) #encode the dataset
To get more information about what we are dealing with let’s take a look at a random image from the dataset.image(x) = chúng tôi # handy for use later ground_truth(x) = x.ground_truth image.(X[rand(1:end, 10)]) #to show the images in IJulia itself
With 3 RGB layers of the matrix (32x32x3), together create the image vector we see above. Now since the dataset is too large, we can pass them in batches (take 1000) and keep a set for validation to check the evaluation of our model. This process of passing them in batches is called mini-batch learning and is very popular in machine learning. So, in layman terms, rather than sending our entire dataset which is big and might not fit in RAM, we break the dataset into small packets (mini-batches), usually chosen randomly, and then train our model on it. It is observed that they help with escaping the saddle points (it is the minimax point on the surface of the curve).
First, we define a ‘getarray’ function that would help in converting the matrices to Float type.getarray(X) = float.( permutedims( channelview( X ), (2, 3, 1))) #get the matrix to float type imgs = [ getarray(X[i].img ) for i in 1:50000] #get all the matrices into float
In our batch of 1000, the first 49,000 images will make our training set and the rest will be saved for validation or test set. To achieve this we can use the function called ‘partition’ which handily breaks down the set we give it in consecutive parts (1000). and to concatenate we use use ‘cat’ function along any dimension.
valset = 49001:50000
Defining the Classifier
Now comes the part where we can define our Convolutional Neural Network (CNN).
Definition of a convolutional neural network is – one that defines a kernel and slides it across a matrix to create an intermediate representation to extract features from. As it goes into deeper layers it creates higher-order features which make it suitable for images (although it can be used in plenty of other situations), where the structure of the subject is what will help us determine which class it belongs to.m = Chain( #crearting a CNN MaxPool((2,2)), #first layer of CNN MaxPool((2, 2)), #second layer of CNN Dense(200, 120), #first layer Dense(120, 84), #second layer Dense(84, 10), #third and final layer with 10 classification labels.
Whenever we have to work with data that has multiple independent classes, cross-entropy comes in handy. And for the momentum, as the name suggests, it gradually lowers the learning rate as we proceed further with the training. This is necessary in case we overshoot from the desired destination and the chances for local minima increase while helping us to maintain a bit of adaptivity in our optimization.using Flux: crossentropy, Momentum #import the optimizers loss(x, y) = sum(crossentropy(m(x), y)) #using loss function opt = Momentum(0.01) #fixing the momentum
Before starting our training loop, we will need some sort of basic accuracy numbers about our model to keep the track of our progress. We can design our custom function to achieve just the same.accuracy(x, y) = mean( onecold(m (x), 1:10) .== onecold(y, 1:10))
Training the Classifier
This is the part where we finally stitch everything together, here we do all the interesting operations which we defined previously to see what our model is capable of doing. Just for the tutorial, we will only be using 10 iterations over dataset (epochs) and optimize it, although for greater accuracy you can increase the epochs and play with hyperparameters a bit.epochs = 10 #number of iterations for epoch = 1:epochs for d in train gs = gradient(params(m)) do l = loss(d...) #calculate losses end update!(opt, params(m), gs) #upadate the params weights end @show accuracy(valX, valY) #show the accuracy of model after each epoch end
Step by step training process gives us a brief idea of how the network was learning the function. This accuracy is not bad at all for a model which was small and had no hyperparameter tuned with smaller epochs.
Training on a GPU
Testing the Network
As we have trained our neural network for 100 passes over the training dataset. But we would need to check if our model has learned anything at all. To check this, we simply predict the labels corresponding to each class from our neural net output, and checking it against the true values of class labels. If the prediction is correct, we add that sample to the correct prediction (true values) list. This will be done on the still unseen part of the data.
Firstly, we would have to get the same processing of images as we did on the training data set to compare them side by side.valset = valimgs(CIFAR10) #value set valimg = [ getarray(valset[i].img) for i in 1:10000] #get them to array labels = onehotbatch([valset[i].ground_truth.class for i in 1:10000],1:10)#encode them test = gpu.( [(cat(valimg[i]..., dims = 4), labels[:,i]) for i in partition(1:10000, 1000)])
Next, we display some of the images from our validation dataset.ids = rand(1:10000, 10) #random image ids image.(valset[ids]) #show images in vector form
We have 10 values as the output for all 10 classes. If the particular value is higher for a class, our network thinks that image is from that particular class. The below image shows the values (energies) in 10 floats and every column corresponds to the output of one image.
Let’s see how our model fared on the dataset.rand_test = getarray.( image.(valset[ids])) #get the test images rand_truth = ground_truth.(valset[ids]) #check the values against true values m(rand_test)
This looks very similar to how we would have expected our results to be. Even after the small training period, let’s see how our model actually performs on any new data given, (that was prepared by us).accuracy( test...)#testing accuracy
49% is clearly much better than the chances of randomly having it correct which is 10% (since we have 10 classes) which is not bad at all for the small hand-coded models without hyper-parameter tuning like ours.
Let’s take a look at how the net performed on all the classes performed individually.class_correct = zeros(10) #creating an array of zeros class_total = zeros(10) for i in 1:10 preds = m(test[i]) #prediction after feeding it in our model lab = test[i] for j = 1:1000 pred_class = findmax(preds[:, j]) #find the argmax for each class actual_class = findmax(lab[:, j]) #true vale of class if pred_class == actual_class #if both are equal then then increment values by 1 class_correct[pred_class] += 1 end class_total[actual_class] += 1 end end class_correct ./ class_total #getting total number of ratios (/100) times we get it correct
The spread seems pretty good, but some classes are performing significantly better than others. It is left for the reader to explore the reason.Conclusion
In this article, we learned how powerful Julia is when it comes to computation. We learned about the Flux package and how to use it to train our hand-written model to classify between 10 different classes in just a few lines of code, that too on GPU!. We also learned about CuArrays and their significance in decreasing computation time. Hope this article has been helpful in starting your journey with Flux (Julia).
Thanks to the Mike Innes, Andrew Dinhobl, Ygor Canalli et al. for valuable documentation. Reach out to me via LinkedIn (Nihal Singh).
This article was published as a part of the Data Science Blogathon.INTRODUCTION
Deep Learning is a subset of Machine Learning based on Artificial Neural Networks. The main idea behind Deep Learning is to mimic the working of a human brain. Some of the use cases in Deep Learning involves Face Recognition, Machine Translation, Speech Recognition, etc. Learning can be supervised,semi-supervised, or unsupervised.
What is Neural Style Transfer?
If you are an artist I am sure you must have thought like, What if I can paint like Picasso? Well to answer that question Deep Learning comes with an interesting solution-Neural Style Transfer.
In layman’s terms, Neural Style Transfer is the art of creating style to any content. Content is the layout or the sketch and Style being the painting or the colors. It is an application of Image transformation using Deep Learning.How does it work?
Unsurprisingly there have been quite a few approaches towards NST but we would start with the traditional implementation for basic understanding and then we will explore more!
The base idea on which Neural Style Transfer is proposed is “it is possible to separate the style representation and content representations in a CNN, learned during a computer vision task (e.g. image recognition task).“
I am assuming you must have heard about the ImageNet Competition from where we were introduced to the state of the art models starting from AlexNet then VGG then RESNET and many more. There is something common in all these models is that they are trained on a large ImageNet Dataset (14 million Images with 1000 classes) which makes them understand the ins and out of any image. We leverage this quality of these models by segregating the content and the style part of an image and providing a loss function to optimize the required result.
As stated earlier, we define a pre-trained convolutional model and loss functions which blends two images visually, therefore we would be requiring the following inputs
A Content Image – image on which we will transfer style
A Style Image – the style we want to transfer
An Input Image(generated) – The final content plus the required style imageIMPLEMENTATION Model
Like I said we will be using pre-trained convolutional neural networks. A way to cut short this process is the concept of transfer learning where libraries like keras have provided us with these giants and let us experiment with them on our own problem statements. Here we will be using keras for transfer learning…we can load the model using the following lines of code…
The first two lines involve importing libraries like keras. Then we will load the model using vgg19.VGG19() where include_top = False depicts that we don’t want the final softmax layer which is the output layer used to classify the 1000 classes in the competition.
The fourth line makes a dictionary that will store the key as layer name and value as layer outputs. Then we finally define our model with inputs as VGG input specification and outputs as the dictionary we made for each layer.
Next, we will define the layers from which we will extract our content and style characteristics.
We have already made the dictionary where we can map these layers and extract the outputs.Loss Functions
To get the desired image we will have to define a loss function which will optimize the losses towards the required result. Here we will be using the concept of per pixel losses.
Per Pixel Loss is a metric that is used to understand the differences between images on a pixel level. It compares the output pixel values with the input values. (Another method is perpetual loss functions we will discuss briefly at the later stages of the blog). Sometimes per pixel loss has its own drawbacks in terms of representing every meaningful characteristic. That’s where perpetual losses come into the picture. The loss terms we will be focusing on will be-
Style LossContent Loss
It makes sure the content we want in the generated image is captured efficiently. It has been observed that CNN captures information about the content in the higher levels of the network, whereas the lower levels are more focused on the individual pixel values.
Here the base is the content features while the combination is the generated output image features. Here the reduce_sum computes the sum of elements across the dimensions of the specified parameters which is in this case the difference of corresponding pixels between input(content) and generated image.Style Loss
Defining the loss function for style has more work than content as multiple layers are involved in computing. The style information is measured as the amount of correlation present between the feature maps per layer. Here we use the Gram Matrix for computing style loss. So what is a gram matrix?
Gram matrix is the measure by which we capture the distribution of features over a set of feature maps in a given layer. So while you are basically computing or minimizing the style loss you are making the level of distribution of features the same in both of the styles and generated images.
So the idea is to make gram matrices of style and generated images and then compute the difference between the two. The Gram matrix(Gij) is the multiplication of the ith and jth feature map of a layer and then summed across height and width as shown above.
Now we have computed both the loss functions. Therefore to calculate the final loss we will compute a weighted summation of both the computed content and style losses.
The above code is the final integration of losses by traversing through the layers and computing the final loss by taking a weighted summation in the second last line. Finally, we would have to define an optimizer(Adam or SGD) that would optimize the loss of the network.OUTPUT
There are many other faster proposals of NST which I would like you to explore and come up with faster mechanisms. One concept to follow is that there is a perpetual loss concept using an Image Transformer neural network which increases the speed of NST and it allows you to train your Image transformer neural network per content and apply various styles without retraining.
It is more helpful in deploying environments as the traditional model trains for each pair of content and style while this concept allows one-time content training followed by multiple style transformations on the same content.BRIEF INTRODUCTION TO FAST NST
Training a style transfer model requires two networks: a pre-trained feature extractor and a transfer network. The pre-trained feature extractor is used to avoid having to use paired training data. Its usefulness arises from the curious tendency for individual layers of deep convolutional neural networks trained for image classification to specialize in understanding specific features of an image.
The pre-trained model enables us to compare the content and style of two images, but it doesn’t actually help us create the stylized image. That’s the job of a second neural network, which we’ll call the transfer network. The transfer network is an image translation network that takes one image as input and outputs another image. Transfer networks typically have an encode-decoder architecture.
At the beginning of training, one or more style images are run through the pre-trained feature extractor, and the outputs at various style layers are saved for later comparison. Content images are then fed into the system. Each content image passes through the pre-trained feature extractor, where outputs at various content layers are saved. The content image then passes through the transfer network, which outputs a stylized image. The stylized image is also run through the feature extractor, and outputs at both the content and style layers are saved.
The quality of the stylized image is defined by a custom loss function that has terms for both content and style. The extracted content features of the stylized image are compared to the original content image, while the extracted style features are compared to those from the reference style image(s). After each step, only the transfer network is updated. The weights of the pre-trained feature extractor remain fixed throughout. By weighting the different terms of the loss function, we can train models to produce output images with lighter or heavier stylization.
Congratulations you have learned what a Neural Style Transfer is and how it works. But that is certainly not the end, next comes exploring the topic with more recent research papers, blogs, and faster implementations. For that too you have a kick start. I hope you enjoyed the blog which targeted the basic traditional workflow of a Neural Style Transfer and I hope I was able to induce an intuition towards understanding NST.
Thank you for reading!
This article was published as a part of the Data Science Blogathon.Introduction to Deep Learning Algorithms
The goal of deep learning is to create models that have abstract features. This is accomplished by building models composed of many layers in which higher layers interpret the input while lower layers abstract the details.
As we train these deep learning networks, the high-level information from the input image produces weights that determine how information is interpreted.
These weights are generated by stochastic gradient descent algorithms based on backpropagation for updating the network parameters.
Training large neural networks on big data can take days or weeks, and it may require adjustments for optimal performance, such as adding more memory or computing power.
Sometimes it’s necessary to experiment with multiple architectures such as nonlinear activation functions or different regularization techniques like dropout or batch normalization.Nearest Neighbor
Clustering algorithms divide a larger set of input into smaller sets so that those sets can be more easily visualized -Nearest Neighbor is one such algorithm because it breaks the input up based on the distance between data points.
For example, if we had an input set containing pictures of animals and cars, the nearest neighbor would break the inputs into two clusters. The nearest cluster would contain images with similar shapes (i.e., animals or cars), and the furthest cluster would contain images with different shapes.Convolutional Neural Networks (CNN)
Convolutional neural networks are a class of artificial neural networks that employ convolutional layers to extract features from the input. CNNs are frequently used in computer vision because they can process visual data with fewer moving parts, i.e., they’re efficient and run well on computers. In this sense, they fit the problem better than traditional deep learning models. The basic idea is that at each layer, one-dimensionality is dropped out of the input; so for a given pixel, there is a pooling layer for just spatial information, then another for just color channels, then one more for channel-independent filters or higher-level activation functions.Long Short Term Memory Neural Network (LSTMNN)
Several deep learning algorithms can be combined in many different ways to produce models that satisfy certain properties. Today, we will discuss the Long Short-Term Memory Neural Network (LSTMNN). LSTM networks are great for detecting patterns and have been found to work well in NLP tasks, image recognition, classification, etc. The LSTMNN is a neural network that consists of LSTM cells.Recurrent Neural Network ( RNN )
An RNN is an artificial neural network that processes data sequentially. In comparison to other neural networks, RNNs can understand arbitrary sequential data better and are better at predicting sequential patterns. The main issue with RNNs is that they require very large amounts of memory, so many are specialized for a single sequence length. They cannot process input sequences in parallel because the hidden state must be saved across time steps. This is because each time step depends on the previous time step, and future time steps cannot be predicted by looking at only one past time step.Generative Adversarial Networks (GANs) Support Vector Machines (SVM)
One deep learning algorithm is Support Vector Machines (SVM). One of the most famous classification algorithms, SVM, is a numerical technique that uses a set of hyperplanes to separate two or more data classes. In binary classification problems, hyperplanes are generally represented by lines in a two-dimensional plane. Generally, an SVM is trained and used for a particular problem by tuning parameters that govern how much data each support vector will contribute to partitioning the space. The kernel function determines how one feature vector maps into an SVM; it could be linear or nonlinear depending on what is being modeled.Artificial Neural Networks (ANN)
ANNs are networks that are composed of artificial neurons. The ANN is modeled after the human brain, but there are variations. The type of neuron being used and the type of layers in the network determine the behavior.
ANNs typically involve an input layer, one or more hidden layers, and an output layer. These layers can be stacked on top of each other and side by side. When a new piece of data comes into the input layer, it travels through the next layer, which might be a hidden layer where it does computations before going on to another layer until it reaches the output layer.
The decision-making process involves training an ANN with some set parameters to learn what outputs should come from inputs with various conditions.Autoencoders Section: Compositional Pattern Producing Networks (CPPN)
Compositional Pattern Producing Networks (CPPN) is a kind of autoencoder, meaning they’re neural networks designed for dimensionality reduction. As their name suggests, CPPNs create patterns from an input set. The patterns created are not just geometric shapes but very creative and organic-looking forms. CPPN Autoencoders can be used in all fields, including image processing, image analysis, and prediction markets.Conclusion
To summarize, deep learning algorithms are a powerful and complex technology capable of identifying data patterns. They enable us to parse information and recognize trends more efficiently than ever.
Furthermore, they help businesses make more informed decisions with their data. I hope this guide has given you a better understanding of deep learning and why it is important for the future.
There are many deep learning algorithms, but the most popular ones used today are Recurrent Neural Networks (RNN) and Convolutional Neural Networks (CNN).
I would recommend taking some time to learn about these two approaches on your own to decide which one might be best for your situation.
The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.
The business goal of the system is to help its customers – smartphone manufacturers, OEMs or any media publishers – bring a “discovery with your camera” experience to end users. ViSenze’s end-to-end platform is powered by computer vision and deep learning and supports multi-class, multi-label, detection, and visual embedding training models. By adopting its platform, clients can quickly train deep learning models and deploy it based on their own unique requirements. The company has deployed a Shoppable UGC tool in early 2023, which automatically understands and tags user-generated content making items within images easy to discover, search, and purchase.The Mission to Bring Simplification to Visual Experiences
The company’s mission is to simplify the way people discover the visual world, and as consumers continue to buy products online more and more, ViSenze recognized that shopping via visuals, opposed to keywords, was much more efficient. This rising e-commerce trend merged with the increased usage of social media platforms (which are increasingly more visual), allows consumers to discover and be inspired by products on a more regular basis, therefore, allowing them to purchase products at that exact moment of inspiration is critical for retailers.
ViSenze recognized this shift in behaviour and the impact that artificial intelligence (AI) could have on the retail commerce landscape for both brands and online shoppers, this prompted the company to develop technology that not only recognized images, but also analyzed its content – visual attributes, metadata, keywords, etc. to help retailers also offer recommendations and visually similar products. Currently, ViSenze works with some of the world’s leading retailers like Rakuten, Uniqlo, H&M, Myntra, ASOS, and Zalora.A Futuristic Leader
Oliver Tan is the CEO & Co-founder of ViSenze. Oliver started his career in the corporate / startup world handling everything from new business development, partnerships, investor relations, consultative selling, bid management, and group operations. Oliver and his co-founders established ViSenze at NExT – a collaboration lab established between the National University of Singapore and Tsinghua University. Oliver and his team recognized there was a common problem when it came to finding products on the web: people were still searching the web using keywords, but still not finding it because of the mismatch between the keywords they used and the product taxonomies that retailers and marketplaces had. Later, Oliver used this market problem as the thesis of his journey into startup entrepreneurship and he inspires his team every day to further develop ViSenze’s solutions to align with the company’s goals and initiatives: making shopping more efficient across any platform at any time.
As a CEO, Oliver is actively involved in company partnerships and business development, as well as day-to-day tasks that keep the company growing like fundraising, working with programs that fuel hiring, speaking at local and global events, and more.A Proactive Approach to AI Tech
AI is rapidly growing so it’s up to the companies that specialize in AI tech, like ViSenze, to further invest in R&D and develop their solutions to positively impact the industry they are focused in. For ViSenze, that’s the retail space as AI disrupts and positively transforms the online shopping experience for both brands and online shoppers. New technologies are being applied to a multitude of areas within retail, from out of stock management to consumer insights, and most notably, retailers are using AI and specifically ViSenze’s technology to solve for two of the biggest challenges within the industry: search and more personalized recommendations.Disruptive Technologies to Rule Innovation
Oliver believes that disruptive technologies like AI will be the driving force behind innovation. Until recently, humans were the main drivers of innovation and now technology is being leveraged more and more to increase efficiency and reduce manual labour. While many fear the impact that these technologies could have on the society such as replacing humans in the workforce, the reverse is also true that massively scaling data that has normally been challenging to humans, is now made possible with AI tools. “It’s important to understand that AI can be deployed to complement human tasks, and it will, in fact, increase the speed of innovation as we can complete tasks much faster than we have in the past,” he added.Driving Growth Through Strategic Partnerships
ViSenze understood that to set themselves apart from their competitors, they needed to be innovative and find a market differentiator that would make them an ideal partner to customers. At ViSenze, the underlying technology is 100% deep learning and they were one of the first few machine learning startups to combine deep learning with computer vision.
ViSenze is a leader in the market and offers highly configurable and not a pre-fixed solution as it understands that strategies and techniques that perform well for one retailer may perform differently for another thus their solutions built are easy to use and customizable. The company has established key partnerships on working with retailers like Rakuten, ASOS, and Uniqlo and foresighted that every customer needed its own personal touch in order to be a successful business enterprise.Valuable Achievements and Global Recognition
Over the past few years, ViSenze has been recognized many times for its achievements and continued innovation in the AI field. The company has been awarded with Interbrand Breakthrough Brand, CognitionX’s Best AI in Retail Award, VentureBeat’s Top 5 Startup, Datamation’s Top 20 Startup, ASEAN ICT Award, Asian Entrepreneurship Award, and most recently the Business HR Awards in Singapore.Challenges Faced by ViSenze
One of ViSenze’s greatest challenges has been quickly expanding into new markets with the right people at the right time. The company found that AI innovation is challenged with market adoption. ViSenze is in a good spot right now, and the company is focused to expand into larger markets like China and the US. This calls for a lot of travel, learning about these places, talking to new retailers, networking, and educating friends and partners. Another challenge has been finding the right talent, especially in a small country like Singapore. Most of ViSenze’s employees consist of multiple nationalities and they love the work for ViSenze, so while finding talent is a challenge for most AI companies, the company has overcome this pretty efficiently.The Bright Future Ahead
As customer viewing preferences continue to shift towards visual over text, search will eventually become entirely visually-driven; therefore, visual search will be adopted as the main technology for product and information discovery. ViSenze foresees algorithms becoming so smart to be able to predict consumers’ usual shopping preferences with over 90% accuracy each time they ask for a recommendation.
Deep learning (DL) could be defined as a form of machine learning based on artificial neural networks which harness multiple processing layers in order to extract progressively better and more high-level insights from data. In essence it is simply a more sophisticated application of artificial intelligence (AI) platforms and machine learning (ML).
Here are some of the top trends in deep learning:
Model Scale Up
A lot of the excitement in deep learning right now is centered around scaling up large, relatively general models (now being called foundation models). They are exhibiting surprising capabilities such as generating novel text, images from text, and video from text. Anything that scales up AI models adds yet more capabilities to deep learning. This is showing up in algorithms that go beyond simplistic responses to multi-faceted answers and actions that dig deeper into data, preferences, and potential actions.
Scale Up Limitations
However, not everyone is convinced that the scaling up of neural networks is going to continue to bear fruit. Roadblocks may lie ahead.
“There is some debate about how far we can get in terms of aspects of intelligence with scaling alone,” said Peter Stone, PhD, Executive Director, Sony AI America.
“Current models are limited in several ways, and some of the community is rushing to point those out. It will be interesting to see what capabilities can be achieved with neural networks alone, and what novel methods will be uncovered for combining neural networks with other AI paradigms.”
AI and Model Training
AI isn’t something you plug in and, presto, instant insights. It takes time for the deep learning platform to analyze data sets, spot patterns, and begin to derive conclusions that have broad applicability in the real world. The good news is that AI platforms are rapidly evolving to keep up with model training demands.
“Organizations can enhance their AI platforms by combining open-source projects and commercial technologies,” said Bin Fan, VP Open Source and Founding Engineer at Alluxio.
“It is essential to consider skills, speed of deployment, the variety of algorithms supported, and the flexibility of the system while making decisions.”
“Containerization being the key, Kubernetes will aid cloud-native MLOps in integrating with more mature technologies,” said Fan.
Prescriptive Modeling over Predictive Modeling
Modeling has gone through many phases over the last many years. Initial attempts tried to predict trends from historical data. This had some value, but didn’t take into account factors such as context, sudden traffic spikes, and shifts in market forces. In particular, real-time data played no real part in early efforts at predictive modeling.
As unstructured data became more important, organizations wanted to mine it to glean insight. Coupled with the rise in processing power, suddenly real time analysis rose to prominence. And the immense amounts of data generated by social media has only added to the need to address real time information.
How does this relate to AI, deep learning, and automation?
“Many of the current and previous industry implementations of AI have relied on the AI to inform a human of some anticipated event, who then has the expert knowledge to know what action to take,” said Frans Cronje, CEO and Co-founder of DataProphet.
“Increasingly, providers are moving to AI that can anticipate a future event and take the correspondent action.”
This opens the door to far more effective deep learning networks. With real time data being constantly used by multi-layered neural networks, AI can be utilized to take more and more of the workload away from humans. Instead of referring the decision to a human expert, deep learning can be used to prescribe predicted decisions based on historical, real-time, and analytical data.
Update the detailed information about A Basic Introduction To Opencv In Deep Learning on the Kientrucdochoi.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!