Trending February 2024 # Ecommerce Scams Everyone Should Be Aware Of # Suggested March 2024 # Top 3 Popular

You are reading the article Ecommerce Scams Everyone Should Be Aware Of updated in February 2024 on the website We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested March 2024 Ecommerce Scams Everyone Should Be Aware Of

The ease and convenience of eCommerce have made it an essential part of our lives. But, with this comes the risks associated with online shopping – cybercriminals are always looking for opportunities to perpetrate fraud. To protect yourself from these scams, familiarize yourself with some of the most common eCommerce scams and how to avoid them.

Fake Shopping Site Scams

One of the popular eCommerce scams people should look out for is fake shopping sites. These types of frauds are designed to steal customer information or money, often pretending to be a valid online shop. For example, the Wayfair scam involved a website that appeared to be an actual and legitimate discount furniture store but was a ruse to extract payment and personal details from unsuspecting customers.

Consumers should always exercise caution when it comes to shopping online. They must ensure they fully vet potential suppliers before providing them with financial details or their credit card information.

Gift Card Scams

Watch out for gift card scams as they become increasingly frequent in the eCommerce industry. Fraudsters craft seemingly legitimate cards at discounted rates to tempt unsuspecting buyers. Make a purchase, however, and you’ll likely get either a counterfeit or absolutely nothing in return.

As a precaution against gift card scams, always obtain your cards from reliable sources. This includes the retailer’s official website or eCommerce site you are looking to purchase from. Also, avoid any discounted offers, as these could be counterfeit items. Furthermore, never give out personal information when purchasing gift cards because legitimate businesses usually do not need such data.

Counterfeit Products

The eCommerce industry is facing a dangerous rise in counterfeit products. Fraudsters are creating knock-off versions of legitimate goods and selling them online at discounted prices. Despite their convincing exterior, these counterfeits usually lack quality and can be hazardous to your health or safety.

Shield yourself from counterfeit products, and only purchase items through trusted sources. Investigate the reviews and responses of previous customers, then research meticulously to confirm that the product you are purchasing is genuine. Please exercise caution when looking at goods sold for an unusually low price since they could be imitations.

Phishing Scams

Phishing scams are one of online criminals’ most frequently used tactics. These fraudsters erect fake websites that imitate legitimate eCommerce sites to acquire personal information, such as passwords and credit card numbers. Furthermore, they might transmit emails with links or act like real businesses to extract sensitive data from you. By being vigilant and aware of these dangers, we can safeguard ourselves against phishing attacks.

Fake Shipping and Delivery Notifications Ransomware Attacks

Ransomware attacks are menacing online scams that use phishing emails or counterfeit eCommerce sites to gain access and install malicious software, locking you out of your device or files until the ransom is paid. This puts companies – small and large – at risk for complete data loss with no recourse but to pay up if they want their information restored.

To protect yourself from ransomware threats, ensure that all your applications are up-to-date and utilize anti-malware and antivirus software. Furthermore, be wary of any emails or messages sent by unfamiliar sources – do not open links or attachments if they seem suspicious in the least bit! By following these simple tips, you can easily secure yourself against potential hackers.


You're reading Ecommerce Scams Everyone Should Be Aware Of

You’ll Never Be #1, Nor Should You Try To Be…


As a culture, especially in America, we like to rank things. There’s top 10 lists of all sorts, competitions to determine who’s the best in a given category and even outside of these so-called competitions, people and companies claim to be “the best” or “number one.” Here’s a question for you? How can there be millions of “the world’s best pizza” pizzarias? There can’t. Deep down, I think we all understand that it’s a marketing ploy by all those restaurants. But then, how do you know if someone is number one? More importantly, does it even matter? I say not. Stay with me to find out why.

First off, I want you to know that many of the concepts I’m sharing here are from a thinker I admire more and more every day, and that’s Simon Sinek. He recently came out with a book called The Infinite Game, which I encourage you to check out. I got introduced to his thinking about 10 months ago via a podcast and his words shifted my thinking dramatically. And I hope you’ll have the same epiphanies now as I did almost a year ago.

The core of what Simon teaches is that there’s two types of games. Finite and infinite. Most of us are familiar with finite games. A game of baseball is a finite game. A sport’s league constitutes as a finite game. The hot dog eating contest at your local fair counts. What each of these have in common is that there are set rules, players, objectives and timelines that are agreed upon by everyone. Pretty much any sport of competition is an example of a finite game, as long as it meets that criteria. And I’ll be the first to admit, I love sports and more specifically, sports metaphors. However, there’s one flaw with sports analogies and that’s the fact that they’re finite and not infinite. 

The difference between a finite game vs an infinite game is that in an infinite game there’s unknown rules, players, objectives and timelines because no one has agreed upon them. No one wins in business. Business, like politics, is an infinite game. The sole objective of the game is to keep playing. How do you keep playing? You keep improving over time. 

A huge problem with our society, in business and in politics, is that we have leaders who don’t know the game they’re playing. They’re viewing things from a finite perspective. It’s most evident in phrases such as “we’re number one” or “we’re the best.” No they’re not. According to who’s definition? 

Let’s examine a company that played by finite rational in an infinite game. Remember Blockbuster? In case you don’t, it was a movie rental company. It was a household name and was poised to be Netflix by entering the streaming game in the 2000s. One executive certainly wanted that but his board disagreed. With the benefit of hindsight, you might be wondering how could that be? It’s because at the time late fees accounted for 12% of profits for Blockbuster. By switching to streaming, they’d lose out on that revenue because there’d be no DVDs to have to return. That 12% lose would only be temporary, as Netflix has shown, but something the Blockbuster board was unwilling to let go of. And long story short, they’re out of business. 

When you think about it, it’s no coincidence that this happens to companies. Why is it that taxi companies didn’t invent Uber? Or why haven’t hotels created AirBnB? It’s because companies are too concerned with maintaining the status quo, wrongfully thinking that it’ll preserve their dominance. That’s an example of finite thinking in an infinite game. 

Here’s a concrete example of someone who exercises existential flexibility—Steve Jobs. Apple was almost ready to release their computers when Jobs met up with some people who showed him a graphic interface. Immediately he went back to his team and said they needed to jump on that and make it part of their product. Many people told him it was impossible and that’d they’d not only miss the deadline but bankrupt the company. Steve said, “Better us than someone else.” 

The rest as they say is history. That decision changed the computer industry and technology landscape as we know it. Steve wasn’t concerned with the short term. He had a mission, or mindset if you want to call it that, for Apple to develop user friendly products to their customers to make their lives easier and better. That value still holds today and is one that doesn’t have an end. It’s ongoing. Hence, it’s part of the infinite game where the objective is to keep going, improving, developing, etc.  

So how come there’s so much discourse around us about being the best and so forth? Well, because as I mentioned, many people in high up positions aren’t aware of the game they’re playing. By definition, finite games are easier to understand and thus they’re perpetuated through sports metaphors and culture in general. Consider a song like Nelly’s “Number 1.” It’s a catchy tune that makes you feel good. But that’s because as a society we’re tried to simplify what success is and how it should be defined for people. This is bullshit. You’ll never be #1, nor should you want to be. Keep playing the game aka keep making content. When you do that, you’re successful.

In the past I’ve described the entertainment industry as a shaking tree. As long as you can hang on to that tree, you’ll do well. That’s a great analogy for the infinite game and while I may have talked about the infinite game primarily in terms of companies, this principle applies equally to us as individuals. The infinite game mindset is a lifestyle. By thinking and acting in these terms, you won’t be trying to shove a square peg in a round hole which means you won’t be as stressed out. I certainly shed a lot of anxiety once I learned about the infinite game. This is because I stopped putting so much pressure on the short term gains and focused on my long term goals and objectives. I had a mission for myself and that’s my guiding light every day. And when I see something that can help me achieve my mission easier, faster, smoother or perhaps better… well then I embrace it. Social media is an example of this. Rather than think of social media as good or bad, it is a tool which I use to help reach, inspire and teach people along with the other tools and tactics at my disposal. That’s the way you need to see it. 

The other way in which making the shift to an infinite mindset helps you is that you’ll stop being jealous of people. When you’re so desperate to be #1 you become vicious in your pursuits. It’s why there’s stereotypes about the LA lifestyle of people being overly narcissistic. Mind you, this is an upward trend throughout the country but it just happens to be more easily noticeable in artists. The point being though, you’ll no longer see people as threats. They may be your rivals but a rival is good. A rival is like a mirror that showcases your weaknesses so you can adjust and grow. Remember, the goal is to stay in the game. You can’t stay in the game if you don’t improve. 

Look at it from this perspective: over the next decade along, the ways in which we create and consume content will change drastically many times over. You need to be able to keep up. If you view it with finite terms, such as “I know how to do x, y and z and I’m great at those,” you’ll eventually be blindsided because a shift will happen sooner or later and you’ll have made yourself obsolete. Be a constant learner. That’s really what the infinite game is all about. 

Here’s where it gets trippy, at least to some people. The infinite game, aka your life or career, is comprised of finite games. Plainly stated, it means you should have deadlines for yourself. The difference is don’t beat yourself up if you miss a certain deadline. For example, my goal has been to write a novel and when I began the process I obviously had written nothing, hence I created timelines of when I wanted things to be done by. Boy, was I off on those. But, by doing that, it propelled me to knuckle down and get to work. Now I’m in the final phase of the novel! So do the same. Set deadlines… but don’t be harsh on yourself if you miss them because you’re still better off than when you began, right? And in the words of Obama, “better is good.” Simon Sinek thinks so. And I agree. 

The Questions You Should Be Asking References

Employers conduct reference checks by contacting a job candidate’s professional and personal connections. The goal is to better understand the candidate’s skills, qualifications and demeanor.

Your reference check questions should discern whether a candidate would fit in at your company. They cannot pertain to your candidate’s personal information.

Your company should develop a process to ensure consistency among all reference checks and determine which questions to ask references.

This article is for business owners and hiring managers who are planning to conduct reference checks for prospective employees.

A job candidate may ace the interview, but that doesn’t always make them a perfect hire. You can better understand an applicant’s compatibility with your company by checking their references, especially if you ask the right questions. We’ll share 32 reference check questions that focus on a candidate’s performance and what it was like to manage and work alongside them. These questions can help ensure a successful hire and a valuable new team member.

What is a reference check?

A reference check is when an employer reaches out to people who can shed light on a job candidate’s strengths and speak to their qualifications. These contacts tend to be previous employers but also may include university professors, longtime colleagues and other people familiar with the applicant’s work. 

As an employer, you may find that reference checks help paint a full picture of a potential hire. Unfortunately, people lie on their resumes sometimes and present qualifications they don’t actually possess. If you ask your applicant’s professional references the right questions, you’ll learn more about the candidate’s skills and qualifications than you would from a traditional job interview alone.

Reference check goals include the following: 

Confirm the written or verbal information the potential employee provided.

Learn about the candidate’s skills and strengths from someone other than the candidate.

Gather information about the applicant’s job performance in past roles to predict their success at your company.

With all of this information, you should have an easier time choosing which candidates to move forward in the hiring process.

Did You Know?

Reference checks can help you avoid hiring horror stories and costly personnel and management headaches.

What information should you ask a reference?

When developing your list of reference check questions, you should determine the information you want to confirm about the job candidate. You may be interested in the references’ insights about the candidate on these topics:

Job performance

Ability to understand and follow directions

Ability to work well as part of a team

Standards for office behavior and ethics

Interests, specialties and demeanor

Ability to give directions and ensure that subordinates follow them (if they’re applying for a leadership role)

Anything else that stands out on the candidate’s resume or emerged during their job interview

Some of these topics are more appropriate to discuss with professional references; others may be more suitable to ask personal references. For example, a former supervisor can speak to how well a candidate operates as part of a team, while a close friend or mentor can describe the candidate’s interests, specialties and demeanor.

Just as there are specific questions you should never ask a job candidate, there are questions you can’t ask a reference. You must only ask questions that pertain to the job; inappropriate questions can subject your company to discrimination claims. 

Consider the following problematic questions you should never ask references:

Anything related to demographics or personal information: Don’t ask about a candidate’s sexuality, age, religion or similar matters.

Anything related to personal health: Don’t ask about a candidate’s medical history or the existence of disabilities. You can ask whether the candidate is capable of performing the tasks the job requires.

Anything related to credit scores: Although you can request a credit score from a job applicant, the Fair Credit Reporting Act bars you from asking references about an applicant’s credit score.

Anything related to family: Don’t ask whether a candidate has (or plans to have) children or a spouse. If you worry that a job applicant with a family might not have enough time for the job, ask references if they think the job’s time demands will suit the candidate.


Gathering references is an important step to ensuring you make the best hiring decisions for your vacant positions. Check out these other tips for hiring the best employees to build your team as effectively as possible.

32 reference check questions to ask

Now that you know what information to request from a reference, you’re ready to develop your list of reference check questions. Below are 32 common reference check questions to use. You may think some don’t apply to your company, but you should speak with your hiring manager before eliminating any questions.

Introductory reference check questions

Is there any information you and/or your company are unwilling or unable to give me about the candidate?

If you can’t share any information with me, can you connect me with any former employees who worked closely with the candidate?

Can you confirm the candidate’s employment start and end dates, salary and job title?

What is your relationship to the candidate, and how did you first meet?

Reference check questions for getting to know the reference

For how long have you worked at your company?

For how long have you had your current job title?

For how long did you work with the candidate, and in what capacities?

Can you think of any reasons I should be speaking with another reference instead of yourself?

Performance-related reference check questions

What positions did the candidate have while at your company?

In what roles did the candidate start and end?

What did these roles entail?

What were the most challenging parts of the candidate’s roles at your company?

How did the candidate face these challenges and other obstacles?

What are the candidate’s professional strengths, and how did they benefit your company?

In what areas does the candidate need improvement?

Do you think the candidate is qualified for this job, and why or why not?

Reference check questions to ask managers

For how long did you directly or indirectly manage the candidate?

In what ways was managing the candidate easy, and in what ways was it challenging?

How did the candidate grow during their time working under you?

What suggestions do you have for managing this candidate?

Reference check questions to ask employees who reported to your candidate

For how long did the candidate manage you, and in what capacity?

What did you like most and least about the candidate’s management style?

How did the candidate’s management style help you grow and learn?

How could the candidate have better managed you and your co-workers?

Reference check questions to ask co-workers

For how long were you among the candidate’s colleagues, and in what capacity?

What did you like most and least about working with the candidate?

How did you grow and learn while working with the candidate?

How did the candidate support you and your other colleagues?

In what ways could the candidate have been a better co-worker to you and your colleagues?

Reference check questions about ethics and behavior

Why did the candidate leave your company?

Did this candidate’s behavior lead to any workplace conflicts or instances of questionable ethics?

If the opportunity arose, would you be willing and/or able to rehire the candidate, and why or why not?

Just as you can speak with your hiring manager about potentially removing certain questions from this list, you can discuss adding other questions. As long as any additional questions shed light on how your candidate would perform during employment with your company and you don’t ask for personal information, there’s a good chance you’re asking the right questions.

Did You Know?

Some candidates may need more scrutiny than others. Some employers conduct background checks to verify job candidates and their credentials.

How to conduct a reference check

If you decide to check references for new hires, implement a formal procedure at your company. This will streamline the process of obtaining your candidates’ references. From start to finish, your hiring team should follow these steps to conduct a thorough reference check:

Decide how many references to obtain from each applicant. Two or three should suffice.

Include a section for references in every job application. Ask candidates to include their references’ full names, phone numbers, email addresses and relationship to the candidate.

Get permission to contact the reference. Include a clause in your job application that the applicant signs to give you permission to contact their references. You should also email a reference to get their permission to ask them questions about the candidate.

Decide whether you’ll conduct your reference checks by phone or email. While sending questions by email will save your company time — especially if you have a standard list of questions you send to all references — verbal checks via phone or video chat, or even in-person meetings, can offer you a clearer understanding of a candidate.

Develop a list of reference check questions. Consider the list above to determine potential questions.

Watch out for red flags. Not every candidate is entirely truthful on their resume, so do your research before contacting a reference.

Establish a standard note-taking process. Don’t expect to remember every single thing you discussed during a reference check. Work with your hiring team to develop a note-taking format and process the whole team can understand and use.


If an employer discovers that a job candidate misrepresented their qualifications or lied on their resume, they can rescind the job offer.

Reference checks help employers make good hiring decisions

Reference checks give you a chance to fill gaps that arise while you’re getting to know a candidate during the interview process. Talking to an applicant’s personal references can tell you if they’re the right fit and help you avoid a costly bad hire. By allowing you to discover the candidate’s management style or determining how they’ll respond under pressure, reference checks can tell you much more than an interview alone. 

Once you’ve conducted reference checks on all of your job candidates, you should have all of the information you need to decide which one is best for the job and reach out with a formal job offer letter. If the candidate accepts, congratulate them and yourself — and start your onboarding process.

Natalie Hamingson contributed to this article.

8 Job Blogs All Graduates Should Be Following

With this in mind, here are eight of the best job blogs that all recent graduates should be following.

Graduate Coach

This is a great all-encompassing blog that features articles pertaining to every aspect of post-graduation life and entering the job hunt space. You can really do your homework to make yourself as attractive as possible to some of the best executive recruitment agencies in the country.

Education Hub

Another website that caters to both pre-graduate and postgraduate individuals, presenting lots of different articles and resources that aim to guide people into postgraduate life and the subsequent job market.

Office For Students

This site is particularly good at creating articles that have a special focus on helping students to adapt to post-graduate life, providing tips that will help individuals to get on their chosen career path as early as possible for the best head start versus their peers.


Graduate Outcomes

This site offers a lot of data that has been collected from previous graduates to help new graduates see exactly what kind of trajectory they might be on based on their qualifications and goals. It is really helpful to have access to real data that has been collected and collated into helpful predictions etc.

Digital Marketing Institute

On this site, you will find a lot of information and help for making yourself as attractive as possible to prospective employers in the business world. In this modern age of online job hunting, it is vital that you present your best self at all times.

Also read:

Best Online Courses to get highest paid in 2023



Though not strictly a blog site, LinkedIn is the central hub of all modern graduate job hunting these days, and having a strong presence and profile on the site is the best thing you can do to make sure you are seen.

12 Important Model Evaluation Metrics For Machine Learning Everyone Should Know (Updated 2023)


The idea of building machine learning models or artificial intelligence or deep learning models works on a constructive feedback principle. You build a model, get feedback from metrics, make improvements, and continue until you achieve a desirable classification accuracy. Evaluation metrics explain the performance of the model. An important aspect of evaluation metrics is their capability to discriminate among model results.

In this tutorial, you will learn about several evaluation metrics in machine learning, like confusion matrix, cross-validation, AUC-ROC curve, and many more classification metrics.

You will also learn about the different metrics used for logistic regression for different problems.

Lastly, you will learn about cross-validation.

What Are Evaluation Metrics?

Evaluation metrics are quantitative measures used to assess the performance and effectiveness of a statistical or machine learning model. These metrics provide insights into how well the model is performing and help in comparing different models or algorithms.

When evaluating a machine learning model, it is crucial to assess its predictive ability, generalization capability, and overall quality. Evaluation metrics provide objective criteria to measure these aspects. The choice of evaluation metrics depends on the specific problem domain, the type of data, and the desired outcome.

I have seen plenty of analysts and aspiring data scientists not even bothering to check how robust their model is. Once they are finished building a model, they hurriedly map predicted values on unseen data. This is an incorrect approach. The ground truth is building a predictive model is not your motive. It’s about creating and selecting a model which gives a high accuracy_score on out-of-sample data. Hence, it is crucial to check the accuracy of your model prior to computing predicted values.

In our industry, we consider different kinds of metrics to evaluate our ml models. The choice of evaluation metric completely depends on the type of model and the implementation plan of the model. After you are finished building your model, these 12 metrics will help you in evaluating your model’s accuracy. Considering the rising popularity and importance of cross-validation, I’ve also mentioned its principles in this article.

Types of Predictive Models

When we talk about predictive models, we are talking either about a regression model (continuous output) or a classification model (nominal or binary output). The evaluation metrics used in each of these models are different.

In classification problems, we use two types of algorithms (dependent on the kind of output it creates):

Class output: Algorithms like SVM and KNN create a class output. For instance, in a binary classification problem, the outputs will be either 0 or 1. However, today we have algorithms that can convert these class outputs to probability. But these algorithms are not well accepted by the statistics community.

Probability output: Algorithms like Logistic Regression, Random Forest, Gradient Boosting, Adaboost, etc., give probability outputs. Converting probability outputs to class output is just a matter of creating a threshold probability.

In regression problems, we do not have such inconsistencies in output. The output is always continuous in nature and requires no further treatment.

Illustrative Example

For a classification model evaluation metric discussion, I have used my predictions for the problem BCI challenge on Kaggle. The solution to the problem is out of the scope of our discussion here. However, the final predictions on the training set have been used for this article. The predictions made for this problem were probability outputs which have been converted to class outputs assuming a threshold of 0.5.

Confusion Matrix

A confusion matrix is an N X N matrix, where N is the number of predicted classes. For the problem in hand, we have N=2, and hence we get a 2 X 2 matrix. It is a performance measurement for machine learning classification problems where the output can be two or more classes. It is a table with 4 different combinations of predicted and actual values. It is extremely useful for measuring precision-recall, Specificity, Accuracy, and most importantly, AUC-ROC curves.

Here are a few definitions you need to remember for a confusion matrix:

True Positive: You predicted positive, and it’s true.

True Negative: You predicted negative, and it’s true.

False Positive: (Type 1 Error): You predicted positive, and it’s false.

False Negative: (Type 2 Error): You predicted negative, and it’s false.

Accuracy: the proportion of the total number of correct predictions that were correct.

Positive Predictive Value or Precision: the proportion of positive cases that were correctly identified.

Negative Predictive Value: the proportion of negative cases that were correctly identified.

Sensitivity or Recall: the proportion of actual positive cases which are correctly identified.

Specificity: the proportion of actual negative cases which are correctly identified.

Rate: It is a measuring factor in a confusion matrix. It has also 4 types TPR, FPR, TNR, and FNR.

The accuracy for the problem in hand comes out to be 88%. As you can see from the above two tables, the Positive Predictive Value is high, but the negative predictive value is quite low. The same holds for Sensitivity and Specificity. This is primarily driven by the threshold value we have chosen. If we decrease our threshold value, the two pairs of starkly different numbers will come closer.

In general, we are concerned with one of the above-defined metrics. For instance, in a pharmaceutical company, they will be more concerned with a minimal wrong positive diagnosis. Hence, they will be more concerned about high Specificity. On the other hand, an attrition model will be more concerned with Sensitivity. Confusion matrices are generally used only with class output models.

F1 Score

In the last section, we discussed precision and recall for classification problems and also highlighted the importance of choosing a precision/recall basis for our use case. What if, for a use case, we are trying to get the best precision and recall at the same time? F1-Score is the harmonic mean of precision and recall values for a classification problem. The formula for F1-Score is as follows:

Now, an obvious question that comes to mind is why you are taking a harmonic mean and not an arithmetic mean. This is because HM punishes extreme values more. Let us understand this with an example. We have a binary classification model with the following results:

Precision: 0, Recall: 1

Here, if we take the arithmetic mean, we get 0.5. It is clear that the above result comes from a dumb classifier that ignores the input and predicts one of the classes as output. Now, if we were to take HM, we would get 0 which is accurate as this model is useless for all purposes.

This seems simple. There are situations, however, for which a data scientist would like to give a percentage more importance/weight to either precision or recall. Altering the above expression a bit such that we can include an adjustable parameter beta for this purpose, we get:

Fbeta measures the effectiveness of a model with respect to a user who attaches β times as much importance to recall as precision.

Gain and Lift Charts

Gain and Lift charts are mainly concerned with checking the rank ordering of the probabilities. Here are the steps to build a Lift/Gain chart:

You will get the following table from which you need to plot Gain/Lift charts:

This is a very informative table. The cumulative Gain chart is the graph between Cumulative %Right and Cumulative %Population. For the case in hand, here is the graph:

This graph tells you how well is your model segregating responders from non-responders. For example, the first decile, however, has 10% of the population, has 14% of the responders. This means we have a 140% lift at the first decile.

What is the maximum lift we could have reached in the first decile? From the first table of this article, we know that the total number of responders is 3850. Also, the first decile will contain 543 observations. Hence, the maximum lift at the first decile could have been 543/3850 ~ 14.1%. Hence, we are quite close to perfection with this model.

Let’s now plot the lift curve. The lift curve is the plot between total lift and %population. Note that for a random model, this always stays flat at 100%. Here is the plot for the case in hand:

You can also plot decile-wise lift with decile number:

What does this graph tell you? It tells you that our model does well till the 7th decile. Post which every decile will be skewed towards non-responders. Any model with lift @ decile above 100% till minimum 3rd decile and maximum 7th decile is a good model. Else you might consider oversampling first.

Lift / Gain charts are widely used in campaign targeting problems. This tells us to which decile we can target customers for a specific campaign. Also, it tells you how much response you expect from the new target base.

Kolomogorov Smirnov Chart

K-S or Kolmogorov-Smirnov chart measures the performance of classification models. More accurately, K-S is a measure of the degree of separation between the positive and negative distributions. The K-S is 100 if the scores partition the population into two separate groups in which one group contains all the positives and the other all the negatives.

On the other hand, If the model cannot differentiate between positives and negatives, then it is as if the model selects cases randomly from the population. The K-S would be 0. In most classification models, the K-S will fall between 0 and 100, and the higher the value, the better the model is at separating the positive from negative cases.

For the case in hand, the following is the table:

We can also plot the %Cumulative Good and Bad to see the maximum separation. Following is a sample plot:

The evaluation metrics covered here are mostly used in classification problems. So far, we’ve learned about the confusion matrix, lift and gain chart, and kolmogorov-smirnov chart. Let’s proceed and learn a few more important metrics.

Area Under the ROC Curve (AUC – ROC)

Let’s first try to understand what the ROC (Receiver operating characteristic) curve is. If we look at the confusion matrix below, we observe that for a probabilistic model, we get different values for each metric.

Hence, for each sensitivity, we get a different specificity. The two vary as follows:

The ROC curve is the plot between sensitivity and (1- specificity). (1- specificity) is also known as the false positive rate, and sensitivity is also known as the True Positive rate. Following is the ROC curve for the case in hand.

Let’s take an example of threshold = 0.5 (refer to confusion matrix). Here is the confusion matrix:

As you can see, the sensitivity at this threshold is 99.6%, and the (1-specificity) is ~60%. This coordinate becomes on point in our ROC curve. To bring this curve down to a single number, we find the area under this curve (AUC).

Note that the area of the entire square is 1*1 = 1. Hence AUC itself is the ratio under the curve and the total area. For the case in hand, we get AUC ROC as 96.4%. Following are a few thumb rules:

.90-1 = excellent (A)

.80-.90 = good (B)

.70-.80 = fair (C)

.60-.70 = poor (D)

.50-.60 = fail (F)

We see that we fall under the excellent band for the current model. But this might simply be over-fitting. In such cases, it becomes very important to do in-time and out-of-time validations.

Points to Remember:

1. For a model which gives class as output will be represented as a single point in the ROC plot.

2. Such models cannot be compared with each other as the judgment needs to be taken on a single metric and not using multiple metrics. For instance, a model with parameters (0.2,0.8) and a model with parameters (0.8,0.2) can be coming out of the same model; hence these metrics should not be directly compared.

3. In the case of the probabilistic model, we were fortunate enough to get a single number which was AUC-ROC. But still, we need to look at the entire curve to make conclusive decisions. It is also possible that one model performs better in some regions and other performs better in others.

Advantages of Using ROC

Why should you use ROC and not metrics like the lift curve?

Lift is dependent on the total response rate of the population. Hence, if the response rate of the population changes, the same model will give a different lift chart. A solution to this concern can be a true lift chart (finding the ratio of lift and perfect model lift at each decile). But such a ratio rarely makes sense for the business.

The ROC curve, on the other hand, is almost independent of the response rate. This is because it has the two axes coming out from columnar calculations of the confusion matrix. The numerator and denominator of both the x and y axis will change on a similar scale in case of a response rate shift.

Log Loss

AUC ROC considers the predicted probabilities for determining our model’s performance. However, there is an issue with AUC ROC, it only takes into account the order of probabilities, and hence it does not take into account the model’s capability to predict a higher probability for samples more likely to be positive. In that case, we could use the log loss, which is nothing but a negative average of the log of corrected predicted probabilities for each instance.

p(yi) is the predicted probability of a positive class

1-p(yi) is the predicted probability of a negative class

yi = 1 for the positive class and 0 for the negative class (actual values)

Let us calculate log loss for a few random values to get the gist of the above mathematical function:

Log loss(1, 0.1) = 2.303

Log loss(1, 0.5) = 0.693

Log loss(1, 0.9) = 0.105

If we plot this relationship, we will get a curve as follows:

It’s apparent from the gentle downward slope towards the right that the Log Loss gradually declines as the predicted probability improves. Moving in the opposite direction, though, the Log Loss ramps up very rapidly as the predicted probability approaches 0.

So, the lower the log loss, the better the model. However, there is no absolute measure of a good log loss, and it is use-case/application dependent.

Whereas the AUC is computed with regards to binary classification with a varying decision threshold, log loss actually takes the “certainty” of classification into account.

Gini Coefficient

The Gini coefficient is sometimes used in classification problems. The Gini coefficient can be derived straight away from the AUC ROC number. Gini is nothing but the ratio between the area between the ROC curve and the diagonal line & the area of the above triangle. Following are the formulae used:

Gini = 2*AUC – 1

Gini above 60% is a good model. For the case in hand, we get Gini as 92.7%.

Concordant – Discordant Ratio

This is, again, one of the most important evaluation metrics for any classification prediction problem. To understand this, let’s assume we have 3 students who have some likelihood of passing this year. Following are our predictions:

Now picture this. if we were to fetch pairs of two from these three students, how many pairs would we have? We will have 3 pairs: AB, BC, and CA. Now, after the year ends, we see that A and C passed this year while B failed. No, we choose all the pairs where we will find one responder and another non-responder. How many such pairs do we have?

We have two pairs AB and BC. Now for each of the 2 pairs, the concordant pair is where the probability of the responder was higher than the non-responder. Whereas discordant pair is where the vice-versa holds true. In case both the probabilities were equal, we say it’s a tie. Let’s see what happens in our case :

Hence, we have 50% of concordant cases in this example. A concordant ratio of more than 60% is considered to be a good model. This metric generally is not used when deciding how many customers to target etc. It is primarily used to access the model’s predictive power. Decisions like how many to target are again taken by KS / Lift charts.

Root Mean Squared Error (RMSE)

RMSE is the most popular evaluation metric used in regression problems. It follows an assumption that errors are unbiased and follow a normal distribution. Here are the key points to consider on RMSE:

The power of ‘square root’ empowers this metric to show large number deviations.

The ‘squared’ nature of this metric helps to deliver more robust results, which prevent canceling the positive and negative error values. In other words, this metric aptly displays the plausible magnitude of the error term.

It avoids the use of absolute error values, which is highly undesirable in mathematical calculations.

When we have more samples, reconstructing the error distribution using RMSE is considered to be more reliable.

RMSE is highly affected by outlier values. Hence, make sure you’ve removed outliers from your data set prior to using this metric.

As compared to mean absolute error, RMSE gives higher weightage and punishes large errors.

RMSE metric is given by:

where N is the Total Number of Observations.

Root Mean Squared Logarithmic Error

In the case of Root mean squared logarithmic error, we take the log of the predictions and actual values. So basically, what changes are the variance that we are measuring? RMSLE is usually used when we don’t want to penalize huge differences in the predicted and the actual values when both predicted, and true values are huge numbers.

If both predicted and actual values are small: RMSE and RMSLE are the same.

R-Squared/Adjusted R-Squared

We learned that when the RMSE decreases, the model’s performance will improve. But these values alone are not intuitive.

In the case of a classification problem, if the model has an accuracy of 0.8, we could gauge how good our model is against a random model, which has an accuracy of 0.5. So the random model can be treated as a benchmark. But when we talk about the RMSE metrics, we do not have a benchmark to compare.

This is where we can use the R-Squared metric. The formula for R-Squared is as follows:

MSE(model): Mean Squared Error of the predictions against the actual values

MSE(baseline): Mean Squared Error of mean prediction against the actual values

In other words, how good is our regression model as compared to a very simple model that just predicts the mean value of the target from the train set as predictions?

Adjusted R-Squared

A model performing equal to the baseline would give R-Squared as 0. Better the model, the higher the r2 value. The best model with all correct predictions would give R-Squared of 1. However, on adding new features to the model, the R-Squared value either increases or remains the same. R-Squared does not penalize for adding features that add no value to the model. So an improved version of the R-Squared is the adjusted R-Squared. The formula for adjusted R-Squared is given by:

k: number of features

n: number of samples

As you can see, this metric takes the number of features into account. When we add more features, the term in the denominator n-(k +1) decreases, so the whole expression increases.

If R-Squared does not increase, that means the feature added isn’t valuable for our model. So overall, we subtract a greater value from 1 and adjusted r2, in turn, would decrease.

Beyond these 12 evaluation metrics, there is another method to check the model performance. These 7 methods are statistically prominent in data science. But, with the arrival of machine learning, we are now blessed with more robust methods of model selection. Yes! I’m talking about Cross Validation.

Though cross-validation isn’t really an evaluation metric that is used openly to communicate model accuracy, the result of cross-validation provides a good enough intuitive result to generalize the performance of a model.

Let’s now understand cross-validation in detail.

Cross Validation

Let’s first understand the importance of cross-validation. Due to my busy schedule these days, I don’t get much time to participate in data science competitions. A long time back, I participated in TFI Competition on Kaggle. Without delving into my competition performance, I would like to show you the dissimilarity between my public and private leaderboard scores.

Here Is an Example of Scoring on Kaggle!

For the TFI competition, the following were three of my solution and scores (the lesser, the better):

Over-fitting is nothing, but when your model becomes highly complex that it starts capturing noise, also. This ‘noise’ adds no value to the model but only inaccuracy.

In the following section, I will discuss how you can know if a solution is an over-fit or not before we actually know the test set results.

The Concept of Cross-Validation

Cross Validation is one of the most important concepts in any type of data modeling. It simply says, try to leave a sample on which you do not train the model and test the model on this sample before finalizing the model.

The above diagram shows how to validate the model with the in-time sample. We simply divide the population into 2 samples and build a model on one sample. The rest of the population is used for in-time validation.

Could there be a negative side to the above approach?

I believe a negative side of this approach is that we lose a good amount of data from training the model. Hence, the model is very high bias. And this won’t give the best estimate for the coefficients. So what’s the next best option?

What if we make a 50:50 split of the training population and the train on the first 50 and validate on the rest 50? Then, we train on the other 50 and test on the first 50. This way, we train the model on the entire population, however, on 50% in one go. This reduces bias because of sample selection to some extent but gives a smaller sample to train the model on. This approach is known as 2-fold cross-validation.

K-Fold Cross-Validation

Let’s extrapolate the last example to k-fold from 2-fold cross-validation. Now, we will try to visualize how a k-fold validation work.

This is a 7-fold cross-validation.

Here’s what goes on behind the scene: we divide the entire population into 7 equal samples. Now we train models on 6 samples (Green boxes) and validate on 1 sample (grey box). Then, at the second iteration, we train the model with a different sample held as validation. In 7 iterations, we have basically built a model on each sample and held each of them as validation. This is a way to reduce the selection bias and reduce the variance in prediction power. Once we have all 7 models, we take an average of the error terms to find which of the models is best.

How does this help to find the best (non-over-fit) model?

k-fold cross-validation is widely used to check whether a model is an overfit or not. If the performance metrics at each of the k times modeling are close to each other and the mean of the metric is highest. In a Kaggle competition, you might rely more on the cross-validation score than the Kaggle public score. This way, you will be sure that the Public score is not just by chance.

How do we implement k-fold with any model?

Coding k-fold in R and Python are very similar. Here is how you code a k-fold in Python:

Try out the code for KFold in the live coding window below:

But how do we choose k?

Think of extreme cases:

Generally, a value of k = 10 is recommended for most purposes.


Measuring the performance of the training sample is pointless. And leaving an in-time validation batch aside is a waste of data. K-Fold gives us a way to use every single data point, which can reduce this selection bias to a good extent. Also, K-fold cross-validation can be used with any modeling technique.

In addition, the metrics covered in this article are some of the most used metrics of evaluation in classification and regression problems.

Key Takeaways

Evaluation metrics measure the quality of the machine learning model.

For any project evaluating machine learning models or algorithms is essential.

Frequently Asked Questions

Q1. What are the 3 metrics of evaluation?

A. Accuracy, confusion matrix, log-loss, and AUC-ROC are the most popular evaluation metrics.

Q2. What are evaluation metrics in machine learning?

A. Evaluation metrics quantify the performance of a machine learning model. It involves training a model and then comparing the predictions to expected values.

Q3. What are the 4 metrics for evaluating classifier performance?

A. Accuracy, confusion matrix, log-loss, and AUC-ROC are the most popular evaluation metrics used for evaluating classifier performance.


Why You Should No Longer Be Using Windows Xp

Even though Windows XP was released way back in 2001, it’s still a pretty great operating system. It’s stable, has a Start button and gets the job done. That’s why there are literally hundreds of millions of computers that still have it installed. It’s so popular, in fact, that it’s the second most installed operating system in the world, only a little bit behind Windows 7.

Unfortunately, this isn’t really a good thing. The reason being Microsoft. Up till now, Microsoft has been extending the deadline for when it would drop support for Windows XP, but now it seems they are really going to kill it off. On April 8th, 2014, Microsoft will stop supporting Windows XP completely. This is big news because it means in about 4 months, there will be millions of computers that are going to be vulnerable to hackers.

Table of Contents

Source: The Next Web

End of support means Microsoft will no longer provide any technical assistance to businesses or consumers for Windows XP troubleshooting. In addition and more importantly, Microsoft will no longer provide any security patches or updates for the operating system. On top of that, you won’t even be able to download Microsoft Security Essentials for Windows XP, the free antivirus software, after this date.

This is really bad news for anyone who has Windows XP installed after this date because there are literally hundreds of security vulnerabilities detected in Windows XP every year and once support ends, all of those security holes will be exploited by hackers and there literally won’t be anything to stop them.

Several Microsoft executives have also stated openly that businesses and users who do not update the operating system or buy a new PC will be open to many new attacks. One possible solution if you still have to use XP for whatever reason is to disconnect the computer from the Internet. Obviously, the PC can still be infected over the LAN network, but you’ll have a better chance than if it’s connected directly to the Internet.

For any business that needs support for Windows XP past the April 2014 deadline, another option is to install Windows Server 2003. Windows Server 2003 uses the same kernel as Windows XP and therefore can run all the same apps without any compatibility issues. Support for Windows Server 2003 does not end until July 15th 2024, so you can get an extra year to upgrade your apps to a newer operating system.

As for consumers, according to Microsoft’s official statement, they would love for you to upgrade to Windows 8.1.

The other reason to upgrade from Windows XP to Windows 7 or Windows 8 is so that you can use the latest software and devices with your computer. XP is so old that a lot of new software simply will not run on it. In addition, some newer devices and gadgets may not be recognized by the system properly.

Upgrading an old PC to Windows 8 is actually not a bad idea. I wrote a post a while back on revitalizing an old PC by installing Windows 8 on it. The system requirements are pretty low, meaning you can install it on some fairly old hardware. Of course, you’ll have to buy Windows 8 Upgrade, which currently costs around $119, but that might be a better option than buying a new computer altogether.

If you do install Windows 8 and you get any kind of error about the CPU not being compatible, check out the link. I’ve personally installed Windows 8.1 on a couple of old desktops at home and they work great for browsing, email, watching videos, reading news, etc. With Windows 8.1, you also kind of get the Start button back, so if you have been holding back because of the lack of a Start button, it’s not that bad anymore in 8.1.

Update the detailed information about Ecommerce Scams Everyone Should Be Aware Of on the website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!