Trending December 2023 # Sap Abap Bdc (Batch Data Communication) Tutorial # Suggested January 2024 # Top 19 Popular

You are reading the article Sap Abap Bdc (Batch Data Communication) Tutorial updated in December 2023 on the website We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested January 2024 Sap Abap Bdc (Batch Data Communication) Tutorial

Introduction to Batch input

Batch input is typically used to transfer data from non-R/3 systems to R/3 systems or to transfer data between R/3 systems.

It is a data transfer technique that allows you to transfer datasets automatically to screens belonging to transactions, and thus to an SAP system. Batch input is controlled by a batch input session.

In this tutorial you will learn:

Batch input session

Groups a series of transaction calls together with input data and user actions . A batch input session can be used to execute a dialog transaction in batch input, where some or all the screens are processed by the session. Batch input sessions are stored in the database as database tables and can be used within a program as internal tables when accessing transactions.

Points to note

BDI works by carrying out normal SAP transactions just as a user would but it execute the transaction automatically.All the screen validations and business logic validation will be done while using Batch Data Input.

It is suitable for entering large amount of data.

No manual interaction is required

Methods of Batch Input

SAP provide two basic methods for transferring legacy data in to the R/3 System.

Classical Batch Input method.

Call Transaction Method.

Classical Batch Input method

After creating the session, you can run the session to execute the SAP transaction in it.

This method uses the function modules BDC_ OPEN, BDC_INSERT and BDC_CLOSE

Batch Input Session can be process in 3 ways

In the foreground

In the background

During processing, with error display

You should process batch input sessions in the foreground or using the error display if you want to test the data transfer.

If you want to execute the data transfer or test its performance, you should process the sessions in the background.

Points to note about Classical Batch Input method

Synchronous processing

Transfer data for multiple transactions.

Synchronous database update.

A batch input process log is generated for each session.

Session cannot be generated in parallel.

Call Transaction Method.

In this method ABAP/4 program uses CALL TRANSACTION USING statement to run an SAP transaction.

Entire batch input process takes place online in the program

Points to Note:

Faster processing of data

Asynchronous processing

Transfer data for a single transaction.

No batch input processing log is generated.

Batch Input Procedures

You will typically observe the following sequence of steps to develop Batch Input for your organization

Analysis of the legacy data. Determine how the data to be transferred is to be mapped in to the SAP Structure. Also take note of necessary data type or data length conversions.

Generate SAP data structures for using in export programs.

Export the data in to a sequential file. Note that character format is required by predefined SAP batch input programs.

If the SAP supplied BDC programs are not used, code your own batch input program. Choose an appropriate batch input method according to the situation.

Process the data and add it to the SAP System.

Analyze the process log. For the CALL TRANSACTION method, where no proper log is created, use the messages collected by your program.

From the results of the process analysis, correct and reprocess the erroneous data.

Writing BDC program

You may observe the following process to write your BDC program

Analyze the transaction(s) to process batch input data.

Decide on the batch input method to use.

Read data from a sequential file

Perform data conversion or error checking.

Storing the data in the batch input structure,BDCDATA.

Generate a batch input session for classical batch input,or process the data directly with CALL TRANSACTION USING statement.

Batch Input Data Structure

Declaration of batch input data structure


Field name Type Length Description




Module pool




Dynpro number




Starting a dynpro




Field name




Field value

The order of fields within the data for a particular screen is not of any significance

Points to Note

While populating the BDC Data make sure that you take into consideration the user settings. This is specially relevant for filling fields which involves numbers ( Like quantity, amount ). It is the user setting which decides on what is the grouping character for numbers E.g.: A number fifty thousand can be written as 50,000.00 or 50.000,00 based on the user setting.

Condense the FVAL field for amount and quantity fields so that they are left aligned.

Note that all the fields that you are populating through BDC should be treated as character type fields while populating the BDC Data table.

In some screens when you are populating values in a table control using BDC you have to note how many number of rows are present on a default size of the screen and code for as many rows. If you have to populate more rows then you have to code for “Page down” functionality as you would do when you are populating the table control manually.

Number of lines that would appear in the above scenario will differ based on the screen size that the user uses. So always code for standard screen size and make your BDC work always in standard screen size irrespective of what the user keeps his screen size as.

Creating Batch Input Session

Open the batch input session session using function module BDC_OPEN_GROUP.

For each transaction in the session:

Fill the BDCDATA with values for all screens and fields processed in the transaction.

Transfer the transaction to the session with BDC_INSERT.

Close the batch input session with BDC_CLOSE_GROUP

Batch Input Recorder

Begin the batch input recorder by selecting the Recording pushbutton from the batch input initial screen.

The recording name is a user defined name and can match the batch input session name which can be created from the recording.

Enter a SAP transaction and begin posting the transaction.

After you have completed posting a SAP transaction you either choose Get Transaction and Save to end the recording or Next Transaction and post another transaction.

Once you have saved the recording you can create a batch input session from the recording and/or generate a batch input program from the recording.

The batch input session you created can now be analyzed just like any other batch input session.

The program which is generated by the function of the batch input recorder is a powerful tool for the data interface programmer. It provides a solid base which can then be altered according to customer requirements.

You're reading Sap Abap Bdc (Batch Data Communication) Tutorial

Introduction To Master Data In Sap

What is Master Data?

Data stored in SAP R/3 is categorized as

Master Data and

Transactional Data.

If you are producing, transferring stock, selling, purchasing, doing physical inventory, whatever your activity may be, it requires certain master data to be maintained.

Example of Master Data

Material master data

Customer master data

Vendor master data

Pricing/conditions master data

Warehouse management master data (storage bin master data)

The ones we will focus in MM module are material master and purchase info record.

Material Master: What you should know about material master?

Material in SAP is a logical representation of certain goods or service that is an object of production, sales, purchasing, inventory management etc. It can be a car, a car part, gasoline, transportation service or consulting service, for example.

InInIn All the information for all materials on their potential use and characteristics in SAP are called material master. This is considered to be the most important master data in SAP (there are also customer master data, vendor master data, conditions/pricing master data etc), and all the processing of the materials are influenced by material master. That is why it’s crucial to have a precise and well maintained material master.

In order to be confident in your actions you need to understand material master views and its implications on processes in other modules, business transactions and a few more helpful information like tables that store material master data, transactions for mass material maintenance (for changing certain characteristics for a large number of materials at once).

Material types

In SAP ERP, every material has a characteristic called “material type” which is used throughout the system for various purposes.

Why is it essential to differentiate between material types and what does that characteristic represent?

It can represent a type of origin and usage – like a finished product (produced goods ready for sale), semifinished product (used as a part of a finished product), trading goods (for resale), raw materials (used for production of semifinished and finished products) etc. These are some of the predefined SAP material types among others like food, beverages, service and many others.

We can define our custom material types if any of standard ones doesn’t fulfill our need.

Most used material types in standard SAP installation

What can be configured on material type level (possible differences between types)?

Material master views: It defines the views associated with a Material Type. For example, if we have a material type “FERT” assigned to our material Product 1000 – we don’t want to have Purchasing based views for that material because we don’t need to purchase our own product – it is configured on material type level.

Default price control: we can set this control to standard or moving average price (covered later in detail), but this can be changed in material master to override the default settings.

Default Item category group: used to determine item category in sales documents. It can be changed in material master to override the default settings.

internal/external purchase orders, special material types indicators, and few more.

Offered material types in MM01 transaction

So material type is assigned to materials that have the same basic settings for material master views, price control, item category group and few other. Material Type can be assigned during the creation of the material in t-code MM01 (covered in detail later)

Where can we find a complete list of materials with their respective material type?

There are numerous transactions for this. The raw data itself is stored in MARA table

(you can view table contents with t-code SE16 or SE16N – newest version of the transaction), but in some systems these t-codes aren’t allowed for a standard user. In such cases, we can easily acquire the list with t-code MM60 (Material list). MM60 is used particularly often as it displays a lot of basic material characteristics.

Selection screen – you can enter only the material number:

Selection screen for MM60 transaction

We can see that material 10410446 in plant AR01 is of type FERT (finished product).

MM60 report results with the export button highlighted

Using the toolbar button highlighted on screen, we can export the list of materials we have selected on screen.

Material group

Another characteristic SAP material is assigned during it’s creation is “material group”, which can represent a group or subgroup of materials based on certain criteria.

Which criteria can be used to create material groups?

Any criteria that suit your needs for reporting purposes is right for your system. You may group materials by the type of raw material used to produce it (different kinds of plastics used in the production process), or you can divide all services into consulting services (with different materials for SAP consulting, IT consulting, financial consulting etc), transportation services (internal transport, international transport), you can also group by production technique (materials created by welding, materials created by extrusion, materials created by injection etc). Grouping depends mainly on the approach your management chooses as appropriate, and it’s mainly done during the implementation, rarely changes in a productive environment.

Assigned material group in material master

In addition, there is a material hierarchy (used mostly in sales & distribution) that can also be used for grouping, but it’s defined almost always according to sales needs as it is used for defining sales conditions (standard discounts for customers, additional discounts, special offers).

On the other hand, material group is mainly used in PP and MM module.

If you need to display material groups for multiple materials, you can use already mentioned t-code MM60. You just need to select more materials in selection criteria.

Material group in report MM60

Material group is easily subject to mass maintenance via transaction MM17. More on that in the material master editing section.

A Complete Python Tutorial To Learn Data Science From Scratch


This article is a complete tutorial to learn data science using python from scratch

It will also help you to learn basic data analysis methods using python

You will also be able to enhance your knowledge of machine learning algorithms


It happened a few years back. After working on SAS for more than 5 years, I decided to move out of my comfort zone. Being a data scientist, my hunt for other useful tools was ON! Fortunately, it didn’t take me long to decide – Python was my appetizer.

I always had an inclination for coding. This was the time to do what I really loved. Code. Turned out, coding was actually quite easy!

I learned the basics of Python within a week. And, since then, I’ve not only explored this language to the depth, but also have helped many other to learn this language. Python was originally a general purpose language. But, over the years, with strong community support, this language got dedicated library for data analysis and predictive modeling.

Due to lack of resource on python for data science, I decided to create this tutorial to help many others to learn python faster. In this tutorial, we will take bite sized information about how to use Python for Data Analysis, chew it till we are comfortable and practice it at our own end.

A complete python tutorial from scratch in data science.

Are you a beginner looking for a place to start your journey in data science and machine learning? Presenting a comprehensive course, full of knowledge and data science learning, curated just for you!

You can also check out the ‘Introduction to Data Science‘ course – a comprehensive introduction to the world of data science. It includes modules on Python, Statistics and Predictive Modeling along with multiple practical projects to get your hands dirty.

Basics of Python for Data Analysis Why learn Python for data analysis?

Python has gathered a lot of interest recently as a choice of language for data analysis. I had basics of Python some time back. Here are some reasons which go in favour of learning Python:

Open Source – free to install

Awesome online community

Very easy to learn

Can become a common language for data science and production of web based analytics products.

Needless to say, it still has few drawbacks too:

It is an interpreted language rather than compiled language – hence might take up more CPU time. However, given the savings in programmer time (due to ease of learning), it might still be a good choice.

Python 2.7 v/s 3.4

This is one of the most debated topics in Python. You will invariably cross paths with it, specially if you are a beginner. There is no right/wrong choice here. It totally depends on the situation and your need to use. I will try to give you some pointers to help you make an informed choice.

Why Python 2.7?

Awesome community support! This is something you’d need in your early days. Python 2 was released in late 2000 and has been in use for more than 15 years.

Plethora of third-party libraries! Though many libraries have provided 3.x support but still a large number of modules work only on 2.x versions. If you plan to use Python for specific applications like web-development with high reliance on external modules, you might be better off with 2.7.

Some of the features of 3.x versions have backward compatibility and can work with 2.7 version.

Why Python 3.4?

Cleaner and faster! Python developers have fixed some inherent glitches and minor drawbacks in order to set a stronger foundation for the future. These might not be very relevant initially, but will matter eventually.

It is the future! 2.7 is the last release for the 2.x family and eventually everyone has to shift to 3.x versions. Python 3 has released stable versions for past 5 years and will continue the same.

There is no clear winner but I suppose the bottom line is that you should focus on learning Python as a language. Shifting between versions should just be a matter of time. Stay tuned for a dedicated article on Python 2.x vs 3.x in the near future!

How to install Python?

There are 2 approaches to install Python:

Download Python

You can download Python directly from its project site and install individual components and libraries you want

Install Package

Alternately, you can download and install a package, which comes with pre-installed libraries. I would recommend downloading Anaconda. Another option could be Enthought Canopy Express.

Second method provides a hassle free installation and hence I’ll recommend that to beginners

The imitation of this approach is you have to wait for the entire package to be upgraded, even if you are interested in the latest version of a single library. It should not matter until and unless, until and unless, you are doing cutting edge statistical research.

Choosing a development environment

Once you have installed Python, there are various options for choosing an environment. Here are the 3 most common options:

Terminal / Shell based

IDLE (default environment)

iPython notebook – similar to markdown in R

IDLE editor for Python

While the right environment depends on your need, I personally prefer iPython Notebooks a lot. It provides a lot of good features for documenting while writing the code itself and you can choose to run the code in blocks (rather than the line by line execution)

We will use iPython environment for this complete tutorial.

Warming up: Running your first Python program

You can use Python as a simple calculator to start with:

Few things to note

You can start iPython notebook by writing “ipython notebook” on your terminal / cmd, depending on the OS you are working on

The interface shows In [*] for inputs and Out[*] for output.

You can execute a code by pressing “Shift + Enter” or “ALT + Enter”, if you want to insert an additional row after.

Before we deep dive into problem solving, lets take a step back and understand the basics of Python. As we know that data structures and iteration and conditional constructs form the crux of any language. In Python, these include lists, strings, tuples, dictionaries, for-loop, while-loop, if-else, etc. Let’s take a look at some of these.

Python libraries and Data Structures Python Data Structures

Following are some data structures, which are used in Python. You should be familiar with them in order to use them as appropriate.

Lists – Lists are one of the most versatile data structure in Python. A list can simply be defined by writing a list of comma separated values in square brackets. Lists might contain items of different types, but usually the items all have the same type. Python lists are mutable and individual elements of a list can be changed.

Here is a quick example to define a list and then access it:

Strings – Strings can simply be defined by use of single ( ‘ ), double ( ” ) or triple ( ”’ ) inverted commas. Strings enclosed in tripe quotes ( ”’ ) can span over multiple lines and are used frequently in docstrings (Python’s way of documenting functions). is used as an escape character. Please note that Python strings are immutable, so you can not change part of strings.

Tuples – A tuple is represented by a number of values separated by commas. Tuples are immutable and the output is surrounded by parentheses so that nested tuples are processed correctly. Additionally, even though tuples are immutable, they can hold mutable data if needed.

Since Tuples are immutable and can not change, they are faster in processing as compared to lists. Hence, if your list is unlikely to change, you should use tuples, instead of lists.

Dictionary – Dictionary is an unordered set of key: value pairs, with the requirement that the keys are unique (within one dictionary). A pair of braces creates an empty dictionary: {}.

Python Iteration and Conditional Constructs

Like most languages, Python also has a FOR-loop which is the most widely used method for iteration. It has a simple syntax:

for i in [Python Iterable]: expression(i) fact=1 for i in range(1,N+1): fact *= i

Coming to conditional statements, these are used to execute code fragments based on a condition. The most commonly used construct is if-else, with following syntax:

if [condition]: __execution if true__ else: __execution if false__

For instance, if we want to print whether the number N is even or odd:

if N%2 == 0: print ('Even') else: print ('Odd')

Now that you are familiar with Python fundamentals, let’s take a step further. What if you have to perform the following tasks:

Multiply 2 matrices

Find the root of a quadratic equation

Plot bar charts and histograms

Make statistical models

Access web-pages

If you try to write code from scratch, its going to be a nightmare and you won’t stay on Python for more than 2 days! But lets not worry about that. Thankfully, there are many libraries with predefined which we can directly import into our code and make our life easy.

For example, consider the factorial example we just saw. We can do that in a single step as:


Off-course we need to import the math library for that. Lets explore the various libraries next.

Python Libraries

Lets take one step ahead in our journey to learn Python by getting acquainted with some useful libraries. The first step is obviously to learn to import them into our environment. There are several ways of doing so in Python:

import math as m from math import *

In the first manner, we have defined an alias m to library math. We can now use various functions from math library (e.g. factorial) by referencing it using the alias m.factorial().

In the second manner, you have imported the entire name space in math i.e. you can directly use factorial() without referring to math.

Tip: Google recommends that you use first style of importing libraries, as you will know where the functions have come from.

Following are a list of libraries, you will need for any scientific computations and data analysis:

SciPy stands for Scientific Python. SciPy is built on NumPy. It is one of the most useful library for variety of high level science and engineering modules like discrete Fourier transform, Linear Algebra, Optimization and Sparse matrices.

Matplotlib for plotting vast variety of graphs, starting from histograms to line plots to heat plots.. You can use Pylab feature in ipython notebook (ipython notebook –pylab = inline) to use these plotting features inline. If you ignore the inline option, then pylab converts ipython environment to an environment, very similar to Matlab. You can also use Latex commands to add math to your plot.

Pandas for structured data operations and manipulations. It is extensively used for data munging and preparation. Pandas were added relatively recently to Python and have been instrumental in boosting Python’s usage in data scientist community.

Scikit Learn for machine learning. Built on NumPy, SciPy and matplotlib, this library contains a lot of efficient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction.

Statsmodels for statistical modeling. Statsmodels is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests. An extensive list of descriptive statistics, statistical tests, plotting functions, and result statistics are available for different types of data and each estimator.

Seaborn for statistical data visualization. Seaborn is a library for making attractive and informative statistical graphics in Python. It is based on matplotlib. Seaborn aims to make visualization a central part of exploring and understanding data.

Bokeh for creating interactive plots, dashboards and data applications on modern web-browsers. It empowers the user to generate elegant and concise graphics in the style of chúng tôi Moreover, it has the capability of high-performance interactivity over very large or streaming datasets.

Blaze for extending the capability of Numpy and Pandas to distributed and streaming datasets. It can be used to access data from a multitude of sources including Bcolz, MongoDB, SQLAlchemy, Apache Spark, PyTables, etc. Together with Bokeh, Blaze can act as a very powerful tool for creating effective visualizations and dashboards on huge chunks of data.

Scrapy for web crawling. It is a very useful framework for getting specific patterns of data. It has the capability to start at a website home url and then dig through web-pages within the website to gather information.

SymPy for symbolic computation. It has wide-ranging capabilities from basic symbolic arithmetic to calculus, algebra, discrete mathematics and quantum physics. Another useful feature is the capability of formatting the result of the computations as LaTeX code.

Requests for accessing the web. It works similar to the the standard python library urllib2 but is much easier to code. You will find subtle differences with urllib2 but for beginners, Requests might be more convenient.

Additional libraries, you might need:

os for Operating system and file operations

networkx and igraph for graph based data manipulations

regular expressions for finding patterns in text data

BeautifulSoup for scrapping web. It is inferior to Scrapy as it will extract information from just a single webpage in a run.

Now that we are familiar with Python fundamentals and additional libraries, lets take a deep dive into problem solving through Python. Yes I mean making a predictive model! In the process, we use some powerful libraries and also come across the next level of data structures. We will take you through the 3 key phases:

Data Exploration – finding out more about the data we have

Data Munging – cleaning the data and playing with it to make it better suit statistical modeling

Predictive Modeling – running the actual algorithms and having fun 🙂

Exploratory analysis in Python using Pandas

In order to explore our data further, let me introduce you to another animal (as if Python was not enough!) – Pandas

Image Source: Wikipedia

Pandas is one of the most useful data analysis library in Python (I know these names sounds weird, but hang on!). They have been instrumental in increasing the use of Python in data science community. We will now use Pandas to read a data set from an Analytics Vidhya competition, perform exploratory analysis and build our first basic categorization algorithm for solving this problem.

Before loading the data, lets understand the 2 key data structures in Pandas – Series and DataFrames

Introduction to Series and Dataframes

Series can be understood as a 1 dimensional labelled / indexed array. You can access individual elements of this series through these labels.

A dataframe is similar to Excel workbook – you have column names referring to columns and you have rows, which can be accessed with use of row numbers. The essential difference being that column names and row numbers are known as column and row index, in case of dataframes.

Series and dataframes form the core data model for Pandas in Python. The data sets are first to read into these dataframes and then various operations (e.g. group by, aggregation etc.) can be applied very easily to its columns.

More: 10 Minutes to Pandas

Practice data set – Loan Prediction Problem

You can download the dataset from here. Here is the description of the variables:

VARIABLE DESCRIPTIONS: Variable Description Loan_ID Unique Loan ID Gender Male/ Female Married Applicant married (Y/N) Dependents Number of dependents Education Applicant Education (Graduate/ Under Graduate) Self_Employed Self employed (Y/N) ApplicantIncome Applicant income CoapplicantIncome Coapplicant income LoanAmount Loan amount in thousands Loan_Amount_Term Term of loan in months Credit_History credit history meets guidelines Property_Area Urban/ Semi Urban/ Rural Loan_Status Loan approved (Y/N) Let’s begin with the exploration

To begin, start iPython interface in Inline Pylab mode by typing following on your terminal/windows command prompt:

ipython notebook --pylab=inline

This opens up iPython notebook in pylab environment, which has a few useful libraries already imported. Also, you will be able to plot your data inline, which makes this a really good environment for interactive data analysis. You can check whether the environment has loaded correctly, by typing the following command (and getting the output as seen in the figure below):


I am currently working in Linux, and have stored the dataset in the following location:

Importing libraries and the data set:

Following are the libraries we will use during this tutorial:




Please note that you do not need to import matplotlib and numpy because of Pylab environment. I have still kept them in the code, in case you use the code in a different environment.

After importing the library, you read the dataset using function read_csv(). This is how the code looks like till this stage:

import pandas as pd import numpy as np import matplotlib as plt %matplotlib inline Quick Data Exploration

Once you have read the dataset, you can have a look at few top rows by using the function head()


This should print 10 rows. Alternately, you can also look at more rows by printing the dataset.

Next, you can look at summary of numerical fields by using describe() function


describe() function would provide count, mean, standard deviation (std), min, quartiles and max in its output (Read this article to refresh basic statistics to understand population distribution)

Here are a few inferences, you can draw by looking at the output of describe() function:

LoanAmount has (614 – 592) 22 missing values.

Loan_Amount_Term has (614 – 600) 14 missing values.

Credit_History has (614 – 564) 50 missing values.

We can also look that about 84% applicants have a credit_history. How? The mean of Credit_History field is 0.84 (Remember, Credit_History has value 1 for those who have a credit history and 0 otherwise)

The ApplicantIncome distribution seems to be in line with expectation. Same with CoapplicantIncome

Please note that we can get an idea of a possible skew in the data by comparing the mean to the median, i.e. the 50% figure.

For the non-numerical values (e.g. Property_Area, Credit_History etc.), we can look at frequency distribution to understand whether they make sense or not. The frequency table can be printed by following command:


Similarly, we can look at unique values of port of credit history. Note that dfname[‘column_name’] is a basic indexing technique to acess a particular column of the dataframe. It can be a list of columns as well. For more information, refer to the “10 Minutes to Pandas” resource shared above.

Distribution analysis

Now that we are familiar with basic data characteristics, let us study distribution of various variables. Let us start with numeric variables – namely ApplicantIncome and LoanAmount

Lets start by plotting the histogram of ApplicantIncome using the following commands:


Here we observe that there are few extreme values. This is also the reason why 50 bins are required to depict the distribution clearly.

Next, we look at box plots to understand the distributions. Box plot for fare can be plotted by:


This confirms the presence of a lot of outliers/extreme values. This can be attributed to the income disparity in the society. Part of this can be driven by the fact that we are looking at people with different education levels. Let us segregate them by Education:

df.boxplot(column='ApplicantIncome', by = 'Education')

We can see that there is no substantial different between the mean income of graduate and non-graduates. But there are a higher number of graduates with very high incomes, which are appearing to be the outliers.

Now, Let’s look at the histogram and boxplot of LoanAmount using the following command:


Again, there are some extreme values. Clearly, both ApplicantIncome and LoanAmount require some amount of data munging. LoanAmount has missing and well as extreme values values, while ApplicantIncome has a few extreme values, which demand deeper understanding. We will take this up in coming sections.

Categorical variable analysis

Now that we understand distributions for ApplicantIncome and LoanIncome, let us understand categorical variables in more details. We will use Excel style pivot table and cross-tabulation. For instance, let us look at the chances of getting a loan based on credit history. This can be achieved in MS Excel using a pivot table as:

Note: here loan status has been coded as 1 for Yes and 0 for No. So the mean represents the probability of getting loan.

Now we will look at the steps required to generate a similar insight using Python. Please refer to this article for getting a hang of the different data manipulation techniques in Pandas.

temp1 = df['Credit_History'].value_counts(ascending=True) temp2 = df.pivot_table(values='Loan_Status',index=['Credit_History'],aggfunc=lambda x:{'Y':1,'N':0}).mean()) print ('Frequency Table for Credit History:') print (temp1) print ('nProbility of getting loan for each Credit History class:') print (temp2)

Now we can observe that we get a similar pivot_table like the MS Excel one. This can be plotted as a bar chart using the “matplotlib” library with following code:

import matplotlib.pyplot as plt fig = plt.figure(figsize=(8,4)) ax1 = fig.add_subplot(121) ax1.set_xlabel('Credit_History') ax1.set_ylabel('Count of Applicants') ax1.set_title("Applicants by Credit_History") temp1.plot(kind='bar') ax2 = fig.add_subplot(122) temp2.plot(kind = 'bar') ax2.set_xlabel('Credit_History') ax2.set_ylabel('Probability of getting loan') ax2.set_title("Probability of getting loan by credit history")

This shows that the chances of getting a loan are eight-fold if the applicant has a valid credit history. You can plot similar graphs by Married, Self-Employed, Property_Area, etc.

Alternately, these two plots can also be visualized by combining them in a stacked chart::

temp3 = pd.crosstab(df['Credit_History'], df['Loan_Status']) temp3.plot(kind='bar', stacked=True, color=['red','blue'], grid=False)

You can also add gender into the mix (similar to the pivot table in Excel):

If you have not realized already, we have just created two basic classification algorithms here, one based on credit history, while other on 2 categorical variables (including gender). You can quickly code this to create your first submission on AV Datahacks.

We just saw how we can do exploratory analysis in Python using Pandas. I hope your love for pandas (the animal) would have increased by now – given the amount of help, the library can provide you in analyzing datasets.

Next let’s explore ApplicantIncome and LoanStatus variables further, perform data munging and create a dataset for applying various modeling techniques. I would strongly urge that you take another dataset and problem and go through an independent example before reading further.

Data Munging in Python : Using Pandas Data munging – recap of the need

While our exploration of the data, we found a few problems in the data set, which needs to be solved before the data is ready for a good model. This exercise is typically referred as “Data Munging”. Here are the problems, we are already aware of:

There are missing values in some variables. We should estimate those values wisely depending on the amount of missing values and the expected importance of variables.

While looking at the distributions, we saw that ApplicantIncome and LoanAmount seemed to contain extreme values at either end. Though they might make intuitive sense, but should be treated appropriately.

In addition to these problems with numerical fields, we should also look at the non-numerical fields i.e. Gender, Property_Area, Married, Education and Dependents to see, if they contain any useful information.

If you are new to Pandas, I would recommend reading this article before moving on. It details some useful techniques of data manipulation.

Check missing values in the dataset

Let us look at missing values in all the variables because most of the models don’t work with missing data and even if they do, imputing them helps more often than not. So, let us check the number of nulls / NaNs in the dataset

df.apply(lambda x: sum(x.isnull()),axis=0)

This command should tell us the number of missing values in each column as isnull() returns 1, if the value is null.

Though the missing values are not very high in number, but many variables have them and each one of these should be estimated and added in the data. Get a detailed view on different imputation techniques through this article.

Note: Remember that missing values may not always be NaNs. For instance, if the Loan_Amount_Term is 0, does it makes sense or would you consider that missing? I suppose your answer is missing and you’re right. So we should check for values which are unpractical.

How to fill missing values in LoanAmount?

There are numerous ways to fill the missing values of loan amount – the simplest being replacement by mean, which can be done by following code:

df['LoanAmount'].fillna(df['LoanAmount'].mean(), inplace=True)

The other extreme could be to build a supervised learning model to predict loan amount on the basis of other variables and then use age along with other variables to predict survival.

Since, the purpose now is to bring out the steps in data munging, I’ll rather take an approach, which lies some where in between these 2 extremes. A key hypothesis is that the whether a person is educated or self-employed can combine to give a good estimate of loan amount.

First, let’s look at the boxplot to see if a trend exists:

Thus we see some variations in the median of loan amount for each group and this can be used to impute the values. But first, we have to ensure that each of Self_Employed and Education variables should not have a missing values.

As we say earlier, Self_Employed has some missing values. Let’s look at the frequency table:

Since ~86% values are “No”, it is safe to impute the missing values as “No” as there is a high probability of success. This can be done using the following code:


Now, we will create a Pivot table, which provides us median values for all the groups of unique values of Self_Employed and Education features. Next, we define a function, which returns the values of these cells and apply it to fill the missing values of loan amount:

table = df.pivot_table(values='LoanAmount', index='Self_Employed' ,columns='Education', aggfunc=np.median) # Define function to return value of this pivot_table def fage(x): return table.loc[x['Self_Employed'],x['Education']] # Replace missing values df['LoanAmount'].fillna(df[df['LoanAmount'].isnull()].apply(fage, axis=1), inplace=True)

This should provide you a good way to impute missing values of loan amount.

NOTE : This method will work only if you have not filled the missing values in Loan_Amount variable using the previous approach, i.e. using mean.

How to treat for extreme values in distribution of LoanAmount and ApplicantIncome?

Let’s analyze LoanAmount first. Since the extreme values are practically possible, i.e. some people might apply for high value loans due to specific needs. So instead of treating them as outliers, let’s try a log transformation to nullify their effect:

df['LoanAmount_log'] = np.log(df['LoanAmount']) df['LoanAmount_log'].hist(bins=20)

Now the distribution looks much closer to normal and effect of extreme values has been significantly subsided.

Coming to ApplicantIncome. One intuition can be that some applicants have lower income but strong support Co-applicants. So it might be a good idea to combine both incomes as total income and take a log transformation of the same.

df['TotalIncome'] = df['ApplicantIncome'] + df['CoapplicantIncome'] df['TotalIncome_log'] = np.log(df['TotalIncome']) df['LoanAmount_log'].hist(bins=20)

Now we see that the distribution is much better than before. I will leave it upto you to impute the missing values for Gender, Married, Dependents, Loan_Amount_Term, Credit_History. Also, I encourage you to think about possible additional information which can be derived from the data. For example, creating a column for LoanAmount/TotalIncome might make sense as it gives an idea of how well the applicant is suited to pay back his loan.

Next, we will look at making predictive models.

Building a Predictive Model in Python

After, we have made the data useful for modeling, let’s now look at the python code to create a predictive model on our data set. Skicit-Learn (sklearn) is the most commonly used library in Python for this purpose and we will follow the trail. I encourage you to get a refresher on sklearn through this article.

Since, sklearn requires all inputs to be numeric, we should convert all our categorical variables into numeric by encoding the categories. Before that we will fill all the missing values in the dataset. This can be done using the following code:

df['Gender'].fillna(df['Gender'].mode()[0], inplace=True) df['Married'].fillna(df['Married'].mode()[0], inplace=True) df['Dependents'].fillna(df['Dependents'].mode()[0], inplace=True) df['Loan_Amount_Term'].fillna(df['Loan_Amount_Term'].mode()[0], inplace=True) df['Credit_History'].fillna(df['Credit_History'].mode()[0], inplace=True) from sklearn.preprocessing import LabelEncoder var_mod = ['Gender','Married','Dependents','Education','Self_Employed','Property_Area','Loan_Status'] le = LabelEncoder() for i in var_mod: df[i] = le.fit_transform(df[i]) df.dtypes

Next, we will import the required modules. Then we will define a generic classification function, which takes a model as input and determines the Accuracy and Cross-Validation scores. Since this is an introductory article, I will not go into the details of coding. Please refer to this article for getting details of the algorithms with R and Python codes. Also, it’ll be good to get a refresher on cross-validation through this article, as it is a very important measure of power performance.

#Import models from scikit learn module: from sklearn.linear_model import LogisticRegression from sklearn.cross_validation import KFold #For K-fold cross validation from sklearn.ensemble import RandomForestClassifier from chúng tôi import DecisionTreeClassifier, export_graphviz from sklearn import metrics #Generic function for making a classification model and accessing performance: def classification_model(model, data, predictors, outcome): #Fit the model:[predictors],data[outcome]) #Make predictions on training set: predictions = model.predict(data[predictors]) #Print accuracy accuracy = metrics.accuracy_score(predictions,data[outcome]) print ("Accuracy : %s" % "{0:.3%}".format(accuracy)) #Perform k-fold cross-validation with 5 folds kf = KFold(data.shape[0], n_folds=5) error = [] for train, test in kf: # Filter training data train_predictors = (data[predictors].iloc[train,:]) # The target we're using to train the algorithm. train_target = data[outcome].iloc[train]    # Training the algorithm using the predictors and target., train_target)    #Record error from each cross-validation run    error.append(model.score(data[predictors].iloc[test,:], data[outcome].iloc[test])) print ("Cross-Validation Score : %s" % "{0:.3%}".format(np.mean(error))) #Fit the model again so that it can be refered outside the function:[predictors],data[outcome]) Logistic Regression

Let’s make our first Logistic Regression model. One way would be to take all the variables into the model but this might result in overfitting (don’t worry if you’re unaware of this terminology yet). In simple words, taking all variables might result in the model understanding complex relations specific to the data and will not generalize well. Read more about Logistic Regression.

We can easily make some intuitive hypothesis to set the ball rolling. The chances of getting a loan will be higher for:

Applicants having a credit history (remember we observed this in exploration?)

Applicants with higher applicant and co-applicant incomes

Applicants with higher education level

Properties in urban areas with high growth perspectives

So let’s make our first model with ‘Credit_History’.

outcome_var = 'Loan_Status' model = LogisticRegression() predictor_var = ['Credit_History'] classification_model(model, df,predictor_var,outcome_var)

Accuracy : 80.945% Cross-Validation Score : 80.946%

#We can try different combination of variables: predictor_var = ['Credit_History','Education','Married','Self_Employed','Property_Area'] classification_model(model, df,predictor_var,outcome_var)

Accuracy : 80.945% Cross-Validation Score : 80.946%

Generally we expect the accuracy to increase on adding variables. But this is a more challenging case. The accuracy and cross-validation score are not getting impacted by less important variables. Credit_History is dominating the mode. We have two options now:

Feature Engineering: dereive new information and try to predict those. I will leave this to your creativity.

Better modeling techniques. Let’s explore this next.

Decision Tree

Decision tree is another method for making a predictive model. It is known to provide higher accuracy than logistic regression model. Read more about Decision Trees.

model = DecisionTreeClassifier() predictor_var = ['Credit_History','Gender','Married','Education'] classification_model(model, df,predictor_var,outcome_var)

Accuracy : 81.930% Cross-Validation Score : 76.656%

Here the model based on categorical variables is unable to have an impact because Credit History is dominating over them. Let’s try a few numerical variables:

#We can try different combination of variables: predictor_var = ['Credit_History','Loan_Amount_Term','LoanAmount_log'] classification_model(model, df,predictor_var,outcome_var)

Accuracy : 92.345% Cross-Validation Score : 71.009%

Here we observed that although the accuracy went up on adding variables, the cross-validation error went down. This is the result of model over-fitting the data. Let’s try an even more sophisticated algorithm and see if it helps:

Random Forest

Random forest is another algorithm for solving the classification problem. Read more about Random Forest.

model = RandomForestClassifier(n_estimators=100) predictor_var = ['Gender', 'Married', 'Dependents', 'Education', 'Self_Employed', 'Loan_Amount_Term', 'Credit_History', 'Property_Area', 'LoanAmount_log','TotalIncome_log'] classification_model(model, df,predictor_var,outcome_var)

Accuracy : 100.000% Cross-Validation Score : 78.179%

Here we see that the accuracy is 100% for the training set. This is the ultimate case of overfitting and can be resolved in two ways:

Reducing the number of predictors

Tuning the model parameters

Let’s try both of these. First we see the feature importance matrix from which we’ll take the most important features.

#Create a series with feature importances: featimp = pd.Series(model.feature_importances_, index=predictor_var).sort_values(ascending=False) print (featimp)

Let’s use the top 5 variables for creating a model. Also, we will modify the parameters of random forest model a little bit:

model = RandomForestClassifier(n_estimators=25, min_samples_split=25, max_depth=7, max_features=1) predictor_var = ['TotalIncome_log','LoanAmount_log','Credit_History','Dependents','Property_Area'] classification_model(model, df,predictor_var,outcome_var)

Accuracy : 82.899% Cross-Validation Score : 81.461%

Notice that although accuracy reduced, but the cross-validation score is improving showing that the model is generalizing well. Remember that random forest models are not exactly repeatable. Different runs will result in slight variations because of randomization. But the output should stay in the ballpark.

You would have noticed that even after some basic parameter tuning on random forest, we have reached a cross-validation accuracy only slightly better than the original logistic regression model. This exercise gives us some very interesting and unique learning:

Using a more sophisticated model does not guarantee better results.

Avoid using complex modeling techniques as a black box without understanding the underlying concepts. Doing so would increase the tendency of overfitting thus making your models less interpretable

Feature Engineering is the key to success. Everyone can use an Xgboost models but the real art and creativity lies in enhancing your features to better suit the model.

You can access the dataset and problem statement used in this post at this link: Loan Prediction Challenge


Now, its time to take the plunge and actually play with some other real datasets. So are you ready to take on the challenge? Accelerate your data science journey with the following Practice Problems:

Frequently Asked Questions

Q1. How to learn python programming?

A. To learn Python programming, you can start by familiarizing yourself with the language’s syntax, data types, control structures, functions, and modules. You can then practice coding by solving problems and building projects. Joining online communities, attending workshops, and taking online courses can also help you learn Python. With regular practice, persistence, and a willingness to learn, you can become proficient in Python and start developing software applications.

Q2. Why Python is used?

A. Python is used for a wide range of applications, including web development, data analysis, scientific computing, machine learning, artificial intelligence, and automation. Python is a high-level, interpreted, and dynamically-typed language that offers ease of use, readability, and flexibility. Its vast library of modules and packages makes it a popular choice for developers looking to create powerful, efficient, and scalable software applications. Python’s popularity and versatility have made it one of the most widely used programming languages in the world today.

Q3. What are the 4 basics of Python?

A. The four basics of Python are variables, data types, control structures, and functions. Variables are used to store values, data types define the type of data that can be stored, control structures dictate the flow of execution, and functions are reusable blocks of code. Understanding these four basics is essential for learning Python programming and developing software applications.

Q4. Can I teach myself Python?

A. Yes, you can teach yourself Python. Start by learning the basics and practicing coding regularly. Join online communities to get help and collaborate on projects. Building projects is a great way to apply your knowledge and develop your skills. Remember to be persistent, learn from mistakes, and keep practicing.

End Notes

I hope this tutorial will help you maximize your efficiency when starting with data science in Python. I am sure this not only gave you an idea about basic data analysis methods but it also showed you how to implement some of the more sophisticated techniques available today.

You should also check out our free Python course and then jump over to learn how to apply it for Data Science.

Python is really a great tool and is becoming an increasingly popular language among the data scientists. The reason being, it’s easy to learn, integrates well with other databases and tools like Spark and Hadoop. Majorly, it has the great computational intensity and has powerful data analytics libraries.

So, learn Python to perform the full life-cycle of any data science project. It includes reading, analyzing, visualizing and finally making predictions.

Note – The discussions of this article are going on at AV’s Discuss portal. Join here! If you like what you just read & want to continue your analytics learning, subscribe to our emails, follow us on twitter or like our facebook page.


Batch Edit Exif Metadata Of Images With Batch Exif Editor Software For Windows 11/10

This article talks about how you can batch edit EXIF metadata of images using Batch EXIF Editor software on Windows 11/10. EXIF that stands for Exchangeable Image File Format is a standard that describes several information tags for images and other media files taken by a digital camera. It may include image detail like camera exposure, camera model, date and time, GPS coordinates, and more. Now, how to edit EXIF tags in a batch of photos on Windows 11/10? If you are wondering the same, here is a guide for you.

In this post, we will be discussing how you can add or edit various EXIF tags in several images simultaneously. You can use third-party freeware that enables you to modify EXIF tags. Let us check out these free batch EXIF editors in detail now.

How do I remove EXIF metadata from multiple pictures?

You can use free software to remove EXIF data from multiple pictures at once. We have mentioned some free tools that enable you to do so. You can use software like ImBatch or digiKam to remove all EXIF tags from a batch of photos simultaneously.  You can check out details on these software below. Besides that, you can also use ExifCleaner for removing EXIF tags from multiple images.

How do I add EXIF data to a JPEG file?

You can add EXIF data to a JPEG file using any of the listed software in this post. All the software on this list support JPEG image formats. So, simply import your JPEG images in any of these software and edit their EXIF data at once.

How to Batch Edit EXIF metadata of Images in Windows 11/10

You can use a free Batch EXIF Editor software that enables you to edit EXIF data of multiple images at once. There are multiple free software available for Windows 11/10 that enable you to do so. Here are some of the better free software to batch edit EXIF information of multiple images on your Windows 11/10 PC:



Picture Metadata Workplace

Bulk Photo Edit

EXIF Date Changer

Let us discuss the above-listed free batch EXIF editor software in elaboration.

1] ImBatch

ImBatch is a free batch image processing software that lets you edit EXIF data of multiple images at once. It lets you edit and convert RAW and standard image formats. It offers several image editing tasks including image metadata editing. It lets you batch edit EXIF and IPTC tags of multiple images simultaneously. Let us check out the steps to use this batch EXIF editor.

How to batch edit EXIF data of images using ImBatch

Here are the main steps to edit EXIF data of photos in batch using this free software in Windows 11/10:

Download and install ImBatch.

Launch ImBatch.

Import multiple images that you want to edit.

Add a Set EXIF/IPTC Tag task.

Edit the desired EXIF tags.

Let us discuss the above steps in detail now.

Firstly, download and install this batch image processor called ImBatch. And then, launch this software to start using it.

After that, select the tag name that you want to edit and then enter its value in the given field. It lets you set a variety of EXIF and IPTC tags including artist, copyright, title, aperture, brightness, camera owner name, date/time, exposure, GPS coordinates, shutter speed, image ID, image description, date, and many more.

You can also use the plus button to add tag values from file attributes, functions, EXIF tags, etc.

It will start batch processing your images with edited EXIF tag values.

This software can also be used for image editing tasks like color correction, color adjustment, rotation, crop, resize, effects, annotate, and more. You can use it for free for non-commercial use only.

Read: Free Image Metadata viewer and editor for Windows.

2] digiKam

digiKam is a free and open-source batch EXIF editor software for Windows 11/10. It is a good software to view, edit, and manage RAW and other common images on your PC. It provides a dedicated batch feature that provides some tools to batch process images. Let us have a look at the steps to use this software now.

Here are the main steps to follow to batch edit EXIF information using digiKam on Windows 11/10:

Download and install digiKam.

Launch this software.

Browse and select source images.

Edit the tags you want to.

Press the Run button to execute the batch EXIF editing task.

First, you need to download and install digiKam on your Windows 11/10 PC. Then, start the GUI of this software.

Now, browse and select the input images that you want to batch process images. And, press the Batch Queue Manager button.

Next, from the Base Tools tab, scroll down to the Metadata section and choose one of the desired metadata editing options. It offers three handy image information editing options including Apply Metadata Template, Remove Metadata, and Time Adjust. You can use all the options one by one.

It lets you edit EXIF, IPTC, and XMP information including dates (creation, digitized, original, etc.), author name, photo credit, copyright, right usage terms, source, instruction, location, etc.

After making changes to the metadata of multiple images, tap on the Run or Run all (for multiple tasks) button to start batch image processing.

Besides batch editing EXIF data, it also lets you perform some other image editing tasks like Noise Reduction, Sharpen Image, RedEye-Correction, Watermarking, Transform, Lens Auto-Correction, etc. You can even convert images from one format to another through this handy photo management software.

Read: How to edit or add Metadata to Photos & Video files in Windows.

3] AnalogExif

You can also try this free dedicated EXIF editor called AnalogExif. It is free software that allows you to edit EXIF data of multiple images at once. It is very easy to use and lets you edit a wide number of EXIF tags. Some of these EXIF tags include:

Camera model, camera serial number, camera manufacturer, flash model, flash manufacturer, lens serial number, lens manufacturer, lens model, maximum aperture, developer, process, author information, original capture time, digitized time, location, exposure, keywords, description, and many more.

The good thing is that it lets you import metadata information from another image and add it to the current images. It also offers an Auto-fill Exposure option. You can even add or edit camera equipment using it.

Here are the steps to use this free software to batch edit EXIF data of multiple images:

Firstly, download and install the AnalogExif software.

Then, start AnalogExif.

Now, import several images to it using its built-in file browser.

You can download it from

See: Remove Properties and Personal information from files, photos

4] Bulk Photo Edit

Bulk Photo Edit is a dedicated software to batch edit EXIF data of images in Windows 11/10. It lets you edit a few EXIF tags in images that include timestamp shift, GPS coordinates, and resolution-DPI. It is a portable and lightweight application that requires no installation. You can use it on the go. Let us discuss the main steps to use it.

You can use the following steps to bulk edit EXIF data using this portable software:

First, download Bulk Photo Edit from here.

Next, unzip the downloaded package.

Then, run the BulkPhotoEditGui application file.

Now, enable the tag you want to edit and then add the new values.

It will edit and save EXIF tags in the selected images.

Read: Best Free Batch Photo Date Stamper software for Windows.

5] EXIF Date Changer

As the name suggests, you can try EXIF Date Changer to edit EXIF data in multiple images at once. It lets you adjust the time and set a new date and time. It is very easy to use. Here are the steps that you can follow to edit the EXIF date of multiple images in bulk:

Firstly, download and install EXIF Date Changer.

Then. start this software.

Now, select the folder containing source images or choose individual images.

Next, from the Time Difference tab, select the desired date adjustment option.

You can download this handy software from here.

That’s it!

Now read: Best Free Video Metadata Editor software for Windows.

A Primer On Network Communication

A Primer on Network Communication

We have always heard that to perform penetration testing, a pentester must be aware about basic networking concepts like IP addresses, classful subnetting, classless subnetting, ports and broadcasting networks. The very first reason is that the activities like which hosts are live in the approved scope and what services, ports and features they have open and responsive will determine what kind of activities an assessor is going to perform in penetration testing. The environment keeps changing and systems are often reallocated. Hence, it is quite possible that old vulnerabilities may crop up again and without the good knowledge of scanning a network, it may happen that the initial scans have to be redone. In our subsequent sections, we will discuss the basics of network communication.

Reference Model

Reference Model offers a means of standardization, which is acceptable worldwide since people using the computer network are located over a wide physical range and their network devices might have heterogeneous architecture. In order to provide communication among heterogeneous devices, we need a standardized model, i.e., a reference model, which would provide us with a way these devices can communicate.

We have two reference models such as the OSI model and the TCP/IP reference model. However, the OSI model is a hypothetical one but the TCP/IP is an practical model.

OSI Model

The Open System Interface was designed by the International organization of Standardization (ISO) and therefore, it is also referred to as the ISO-OSI Model.

The OSI model consists of seven layers as shown in the following diagram. Each layer has a specific function, however each layer provides services to the layer above.

Physical Layer

The Physical layer is responsible for the following activities −

Activating, maintaining and deactivating the physical connection.

Defining voltages and data rates needed for transmission.

Converting digital bits into electrical signal.

Deciding whether the connection is simplex, half-duplex or full-duplex.

Data Link Layer

The data link layer performs the following functions −

Performs synchronization and error control for the information that is to be transmitted over the physical link.

Enables error detection, and adds error detection bits to the data that is to be transmitted.

Network Layer

The network layer performs the following functions −

To route the signals through various channels to the other end.

To act as the network controller by deciding which route data should take.

To divide the outgoing messages into packets and to assemble incoming packets into messages for higher levels.

Transport Layer

The Transport layer performs the following functions −

It decides if the data transmission should take place on parallel paths or single path.

It performs multiplexing, splitting on the data.

It breaks the data groups into smaller units so that they are handled more efficiently by the network layer.

The Transport Layer guarantees transmission of data from one end to other end.

Session Layer

The Session layer performs the following functions −

Manages the messages and synchronizes conversations between two different applications.

It controls logging on and off, user identification, billing and session management.

Presentation Layer

The Presentation layer performs the following functions −

This layer ensures that the information is delivered in such a form that the receiving system will understand and use it.

Application Layer

The Application layer performs the following functions −

It provides different services such as manipulation of information in several ways, retransferring the files of information, distributing the results, etc.

The functions such as LOGIN or password checking are also performed by the application layer.

TCP/IP Model

The Transmission Control Protocol and Internet Protocol (TCP/IP) model is a practical model and is used in the Internet.

The TCP/IP model combines the two layers (Physical and Data link layer) into one layer – Host-to-Network layer. The following diagram shows the various layers of TCP/IP model −

Application Layer

This layer is same as that of the OSI model and performs the following functions −

It provides different services such as manipulation of information in several ways, retransferring the files of information, distributing the results, etc.

The application layer also performs the functions such as LOGIN or password checking.

Following are the different protocols used in the Application layer −







Transport Layer

It does the same functions as that of the transport layer in the OSI model. Consider the following important points related to the transport layer −

It uses TCP and UDP protocol for end to end transmission.

TCP is a reliable and connection oriented protocol.

TCP also handles flow control.

The UDP is not reliable and a connection less protocol does not perform flow control.

TCP/IP and UDP protocols are employed in this layer.

Internet Layer

The function of this layer is to allow the host to insert packets into network and then make them travel independently to the destination. However, the order of receiving the packet can be different from the sequence they were sent.

Internet Protocol (IP) is employed in Internet layer.

Host-to-Network Layer

This is the lowest layer in the TCP/IP model. The host has to connect to network using some protocol, so that it can send IP packets over it. This protocol varies from host to host and network to network.

The different protocols used in this layer are −




Packet radio

Useful Architecture

Following are some useful architectures, which are used in network communication −

The Ethernet frame architecture

An engineer named Robert Metcalfe first invented Ethernet network, defined under IEEE standard 802.3, in 1973. It was first used to interconnect and send data between workstation and printer. More than 80% of the LANs use Ethernet standard for its speed, lower cost and ease of installation. On the other side, if we talk about frame then data travels from host to host in the way. A frame is constituted by various components like MAC address, IP header, start and end delimiter, etc.

The Ethernet frame starts with Preamble and SFD. The Ethernet header contains both Source and Destination MAC address, after which the payload of frame is present. The last field is CRC, which is used to detect the error. The basic Ethernet frame structure is defined in the IEEE 802.3 standard, which is explained as below −

The Ethernet (IEEE 802.3) frame format

The Ethernet packet transports an Ethernet frame as its payload. Following is a graphical representation of Ethernet frame along with the description of each field −

Field Name Preamble SFD(Start of frame delimiter) Destination MAC Source MAC Type Data CRC

Size(in bytes) 7 1 6 6 2 46-1500 4


An Ethernet frame is preceded by a preamble, 7 bytes of size, which informs the receiving system that a frame is starting and allows sender as well as receiver to establish bit synchronization.

SFD (Start of frame delimiter)

This is a 1-byte field used to signify that the Destination MAC address field begins with the next byte. Sometimes the SFD field is considered to be the part of Preamble. That is why preamble is considered 8 bytes in many places.

Destination MAC − This is a 6-byte field wherein, we have the address of the receiving system.

Source MAC − This is a 6-byte field wherein, we have the address of the sending system.

Type − It defines the type of protocol inside the frame. For example, IPv4 or IPv6. Its size is 2 bytes.

Data − This is also called Payload and the actual data is inserted here. Its length must be between 46-1500 bytes. If the length is less than 46 bytes then padding 0’s is added to meet the minimum possible length, i.e., 46.

CRC (Cyclic Redundancy Check) − This is a 4-byte field containing 32-bit CRC, which allows detection of corrupted data.

Extended Ethernet Frame (Ethernet II frame) Format

Following is a graphical representation of the extended Ethernet frame using which we can get Payload larger than 1500 bytes −

Field Name Destination MAC Source MAC Type DSAP SSAP Ctrl Data CRC

Size(in bytes) 6 6 2 1 1 1 >46 4

The description of the fields, which are different from IEEE 802.3 Ethernet frame, is as follows −

DSAP (Destination Service Access Point)

DSAP is a 1-byte long field that represents the logical addresses of the network layer entity intended to receive the message.

SSAP (Source Service Access Point)

SSAP is a 1-byte long field that represents the logical address of the network layer entity that has created the message.


This is a 1-byte control field.

The IP Packet Architecture

Internet Protocol is one of the major protocols in the TCP/IP protocols suite. This protocol works at the network layer of the OSI model and at the Internet layer of the TCP/IP model. Thus, this protocol has the responsibility of identifying hosts based upon their logical addresses and to route data among them over the underlying network. IP provides a mechanism to uniquely identify hosts by an IP addressing scheme. IP uses best effort delivery, i.e., it does not guarantee that packets would be delivered to the destined host, but it will do its best to reach the destination.

In our subsequent sections, we will learn about the two different versions of IP.


This is the Internet Protocol version 4, which uses 32-bit logical address. Following is the diagram of IPv4 header along with the description of fields −


This is the version of the Internet Protocol used; for example, IPv4.


Internet Header Length; length of the entire IP header.


Differentiated Services Code Point; this is the Type of Service.


Explicit Congestion Notification; it carries information about the congestion seen in the route.

Total Length

The length of the entire IP Packet (including IP header and IP Payload).


If the IP packet is fragmented during the transmission, all the fragments contain the same identification number.


As required by the network resources, if the IP Packet is too large to handle, these ‘flags’ tell if they can be fragmented or not. In this 3-bit flag, the MSB is always set to ‘0’.

Fragment Offset

This offset tells the exact position of the fragment in the original IP Packet.

Time to Live

To avoid looping in the network, every packet is sent with some TTL value set, which tells the network how many routers (hops) this packet can cross. At each hop, its value is decremented by one and when the value reaches zero, the packet is discarded.


Tells the Network layer at the destination host, to which Protocol this packet belongs, i.e., the next level Protocol. For example, the protocol number of ICMP is 1, TCP is 6 and UDP is 17.

Header Checksum

This field is used to keep checksum value of entire header, which is then used to check if the packet is received error-free.

Source Address

32-bit address of the Sender (or source) of the packet.

Destination Address

32-bit address of the Receiver (or destination) of the packet.


This is an optional field, which is used if the value of IHL is greater than 5. These options may contain values for options such as Security, Record Route, Time Stamp, etc.


The Internet Protocol version 6 is the most recent communications protocol, which as its predecessor IPv4 works on the Network Layer (Layer-3). Along with its offering of an enormous amount of logical address space, this protocol has ample features , which address the shortcoming of IPv4. Following is the diagram of IPv4 header along with the description of fields −

Version (4-bits)

It represents the version of Internet Protocol — 0110.

Traffic Class (8-bits)

These 8 bits are divided into two parts. The most significant 6 bits are used for the Type of Service to let the Router Known what services should be provided to this packet. The least significant 2 bits are used for Explicit Congestion Notification (ECN).

Flow Label (20-bits)

This label is used to maintain the sequential flow of the packets belonging to a communication. The source labels the sequence to help the router identify that a particular packet belongs to a specific flow of information. This field helps avoid re-ordering of data packets. It is designed for streaming/real-time media.

Payload Length (16-bits)

This field is used to tell the routers how much information a particular packet contains in its payload. Payload is composed of Extension Headers and Upper Layer data. With 16 bits, up to 65535 bytes can be indicated; but if the Extension Headers contain Hop-by-Hop Extension Header, then the payload may exceed 65535 bytes and this field is set to 0.

Next Header (8-bits)

Either this field is used to indicate the type of Extension Header, or if the Extension Header is not present then it indicates the Upper Layer PDU. The values for the type of Upper Layer PDU are same as IPv4’s.

Hop Limit (8-bits)

This field is used to stop packet to loop in the network infinitely. This is same as TTL in IPv4. The value of Hop Limit field is decremented by 1 as it passes a link (router/hop). When the field reaches 0, the packet is discarded.

Source Address (128-bits)

This field indicates the address of originator of the packet.

Destination Address (128-bits)

This field provides the address of the intended recipient of the packet.

The TCP (Transmission Control Protocol) Header Architecture

As we know that TCP is a connection-oriented protocol, in which a session is established between two systems before starting communication. The connection would be closed once the communication has been completed. TCP uses a three-way handshake technique for establishing the connection socket between two systems. Three-way handshake means that three messages — SYN, SYN-ACK and ACK, are sent back and forth between two systems. The steps of working between two systems, initiating and target systems, are as follows −

Step 1 − Packet with SYN flag set

First of all the system that is trying to initiate a connection starts with a packet that has the SYN flag set.

Step 2 − Packet with SYN-ACK flag set

Now, in this step the target system returns a packet with SYN and ACK flag sets.

Step 3 − Packet with ACK flag set

At last, the initiating system will return a packet to the original target system with ACK flag set.

Following is the diagram of the TCP header along with the description of fields −

Source Port (16-bits)

It identifies the source port of the application process on the sending device.

Destination Port (16-bits)

It identifies the destination port of the application process on the receiving device.

Sequence Number (32-bits)

The sequence number of data bytes of a segment in a session.

Acknowledgement Number (32-bits)

When ACK flag is set, this number contains the next sequence number of the data byte expected and works as an acknowledgment of the previous data received.

Data Offset (4-bits)

This field implies both, the size of the TCP header (32-bit words) and the offset of data in the current packet in the whole TCP segment.

Reserved (3-bits)

Reserved for future use and set to zero by default.

Flags (1-bit each)

NS − Explicit Congestion Notification signaling process uses this Nonce Sum bit.

CWR − When a host receives packet with ECE bit set, it sets Congestion Windows Reduced to acknowledge that ECE received.

ECE − It has two meanings −

If SYN bit is clear to 0, then ECE means that the IP packet has its CE (congestion experience) bit set.

If SYN bit is set to 1, ECE means that the device is ECT capable.

URG − It indicates that Urgent Pointer field has significant data and should be processed.

ACK − It indicates that Acknowledgement field has significance. If ACK is cleared to 0, it indicates that packet does not contain any acknowledgment.

PSH − When set, it is a request to the receiving station to PUSH data (as soon as it comes) to the receiving application without buffering it.

RST − Reset flag has the following features −

It is used to refuse an incoming connection.

It is used to reject a segment.

It is used to restart a connection.

SYN − This flag is used to set up a connection between hosts.

FIN − This flag is used to release a connection and no more data is exchanged thereafter. Because packets with SYN and FIN flags have sequence numbers, they are processed in correct order.

Windows Size

This field is used for flow control between two stations and indicates the amount of buffer (in bytes) the receiver has allocated for a segment, i.e., how much data is the receiver expecting.

Checksum − This field contains the checksum of Header, Data and Pseudo Headers.

Urgent Pointer − It points to the urgent data byte if URG flag is set to 1.

Options − It facilitates additional options, which are not covered by the regular header. Option field is always described in 32-bit words. If this field contains data less than 32-bit, padding is used to cover the remaining bits to reach 32-bit boundary.

The UDP (User Datagram Protocol) header architecture

UDP is a simple connectionless protocol unlike TCP, a connection-oriented protocol. It involves minimum amount of communication mechanism. In UDP, the receiver does not generate an acknowledgment of packet received and in turn, the sender does not wait for any acknowledgment of the packet sent. This shortcoming makes this protocol unreliable as well as easier on processing. Following is the diagram of the UDP header along with the description of fields −

Source Port

This 16-bits information is used to identify the source port of the packet.

Destination Port

This 16-bits information is used to identify the application level service on the destination machine.


The length field specifies the entire length of the UDP packet (including header). It is a 16-bits field and the minimum value is 8-byte, i.e., the size of the UDP header itself.


This field stores the checksum value generated by the sender before sending. IPv4 has this field as optional so when checksum field does not contain any value, it is made 0 and all its bits are set to zero.

To study TCP in detail, please refer to this link — User Datagram Protocol


Correspondence In Sap – Configuration & Types

There are various standard correspondence types available like invoice print, account statement etc. Custom correspondence types can also be created.

Correspondences can be created at the time of particular business transaction processing or at a later stage for already created transaction postings.

Correspondence can be sent to customer/ vendor in various formats like email, and fax. Correspondence is basically letters etc. which is sent from SAP to vendor/ customer etc.

Correspondence can be created individually or collectively, ad-hoc or via automated batch job.

Types of correspondence

Following is example list of various standard correspondence types, which can be copied to create a specific custom form, program, etc.

Correspondence Type Correspondence Description Print Program Required Data Sample standard SAP Script Form

SAP01 Payment notices RFKORD00 Document number F140_PAY_CONF_01

SAP06 Account statements RFKORD10 Account number and date F140_ACC_STAT_01

SAP07 Bill of exchange charges statements RFKORD20 Document number F140_BILL_CHA_01

SAP09 Internal documents RFKORD30 Document number F140_INT_DOCU_01

SAP10 Individual letters RFKORD40 Account number F140_IND_TEXT_01

SAP11 Document extracts RFKORD50 Document number F140_DOCU_EXC_01

SAP13 Customer statements RFKORD11 Customer number and date F140_CUS_STAT_01

How to do Correspondence configuration

Configuration of Correspondence in SAP can be carried out in the following steps below

Step 1) Define Correspondence Type

Transaction Code:-OB77

Here various SAP standard correspondence types are available. You can also create your custom correspondence types. You can specify that what data is required for generating a correspondence, e.g. for account statement you can specify that customer/ vendor master is necessary for the statement. Also, you can specify the date parameters and the text to appear for date selection.

Step 2) Assign Program to Correspondence Type

Transaction Code: –OB78

Here you need to link the correspondence generator program to the correspondence type. You can also specify different programs for different company codes. (Also, you can specify the default variant here for the program to execute. You can create such variant from transaction SE38/ SA38 for the program.)

You can also create your own custom program as a copy of the standard program and can make suitable changes to meet any of your client specific need.

Step 3) Determine Call-Up Functions for Correspondence Type

Transaction Code:-OB79

Here you need to specify that at what point of time you can generate the particular correspondence type. You can also specify a different setting for different company codes. The various options available are:-

At the time of posting payments (e.g. F-28, F-26, etc.)

At the time of document display or change (e.g. FB02, FB03, etc.)

At the time of account display (e.g. FBL1N, FBL5N, etc.)

Step 4) Assign Correspondence Form to Correspondence Print Program

Transaction Code: –OB96

Here you need to specify that which forms definition will be used by the correspondence print program. You can also specify a different setting for different company codes. (The SAP Script form is defined using the transaction SE71, where the various data is arranged in the output format to get processed. This SAP Script form defines the layout in the output.)

You can also use two digit form IDs, by which you can call different forms for different form IDs in the same company code.

This form ID can be given in the selection screen of the print program generating correspondence. You can select only one form ID at one time for a correspondence type. You can create multiple correspondence types, triggering different form ids.

Step 5) Define Sender Details for Correspondence

Transaction Code:-OBB1

You can here link the details for header, footer, signature and sender. This text is defined using the transaction SO10 with text ID as linked above (e.g. ADRS). You can also specify a different setting for different company codes.(Also two digit sender variant can be defined, which you can give to the selection parameters of the print program. This will enable different sender details within same company code.)

Step 6) Define Sort Variants for Correspondence

Transaction Code: –O7S4

You can here specify that in which order the correspondence letters will get generated. E.g. if you are generating account statement for multiple vendors, then vendors will get sorted in this order and then the letter will get generated. This Sort Variant can be given in the selection screen of the print program generating correspondence.

Step 7) Define Sort Variants for Line Items in Correspondence

Transaction Code: –O7S6

You can here specify that in which order the various line items will appear in a correspondence letters. E.g. if a vendor account statement has multiple invoices, then invoices will get sorted in this order and then the letter will get generated.

This Sort Variant can be given in the selection screen of the print program generating correspondence.

Correspondence Generation

As shown earlier also while configuring the call-up point, the correspondence can be generated at below point of times:-

At the time of posting payments (e.g. F-28, F-26, etc.)

At the time of document display or change (e.g. FB02, FB03, etc.)

At the time of account display (e.g. FBL1N, FBL5N, etc.)

Correspondence can be generated for a particular document or for vendor(s)/ customer(s) account. Subsequent slides will explain the generation of correspondence via different ways and its printing.

Correspondence Generation (Method A):-

The correspondence can be generated while you create, change or display the document.

Similarly, you can create the correspondence from document display/ change from the transaction, like in FB02/ FB03/ FBl1N/ FLB5N, etc.

Correspondence Generation (Method B)

For existing accounting documents you can use transaction code FB12.

Hereafter entering the company code, it will ask the correspondence type. Select Correspondence Type and it will ask you to enter document number/ account no. etc. based on the correspondence type setting. After this, the correspondence is requested.

Correspondence Generation (Method C)

From transaction F.27, you can generate the correspondence (Account Statement) for vendor(s) / customer(s).

If you select “Individual Request” check box, then if the same vendor/ customer has line items in multiple company codes, then for each company code a separate statement will get generated.

Correspondence Printing

Correspondence Printing (Method A):-

Use transaction code F.61 to print the relevant correspondence type already generated. On execution, it will simply print the correspondence (If Email/ Fax, etc. is configured, then output will be generated in that format)

Correspondence Printing (Method B):-

From transaction F.64, you can see the correspondence letter (Spool) generated and can print it. (The difference from F.61 is that, in F.64 you can also do other operations (like delete, print preview, etc.) for correspondence request already generated.)

Correspondence Via Email

Then the correspondence for this customer/ vendor will get generated in email format instead of print output (Taking into account the user exit setting made to determine method of communication in the user exit given in next slide.)

(Note: Similarly you can make setting for Fax output via selecting the standard communication as FAX and maintaining Fax no.)

Update the detailed information about Sap Abap Bdc (Batch Data Communication) Tutorial on the website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!