Trending February 2024 # Gradient Boosting Algorithm: A Complete Guide For Beginners # Suggested March 2024 # Top 11 Popular

You are reading the article Gradient Boosting Algorithm: A Complete Guide For Beginners updated in February 2024 on the website We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested March 2024 Gradient Boosting Algorithm: A Complete Guide For Beginners

This article was published as a part of the Data Science Blogathon


In this article, I am going to discuss the math intuition behind the Gradient boosting algorithm. It is more popularly known as Gradient boosting Machine or GBM. It is a boosting method and I have talked more about boosting in this article.

Gradient boosting is a method standing out for its prediction speed and accuracy, particularly with large and complex datasets. From Kaggle competitions to machine learning solutions for business, this algorithm has produced the best results. We already know that errors play a major role in any machine learning algorithm. There are mainly two types of error, bias error and variance error. Gradient boost algorithm helps us minimize bias error of the model

Before getting into the details of this algorithm we must have some knowledge about AdaBoost Algorithm which is again a boosting method. This algorithm starts by building a decision stump and then assigning equal weights to all the data points. Then it increases the weights for all the points which are misclassified and lowers the weight for those that are easy to classify or are correctly classified. A new decision stump is made for these weighted data points. The idea behind this is to improve the predictions made by the first stump. I have talked more about this algorithm here. Read this article before starting this algorithm to get a better understanding.

The main difference between these two algorithms is that Gradient boosting has a fixed base estimator i.e., Decision Trees whereas in AdaBoost we can change the base estimator according to our needs.

Table of Contents

What is Boosting technique?

Gradient Boosting Algorithm

Gradient Boosting Regressor

Example of gradient boosting

Gradient Boosting Classifier

Implementation using Scikit-learn

Parameter Tuning in Gradient Boosting (GBM) in Python

End Notes

Table of contents

About the Author

What is boosting?

While studying machine learning you must have come across this term called Boosting. It is the most misinterpreted term in the field of Data Science. The principle behind boosting algorithms is first we built a model on the training dataset, then a second model is built to rectify the errors present in the first model. Let me try to explain to you what exactly does this means and how does this works.

Suppose you have n data points and 2 output classes (0 and 1). You want to create a model to detect the class of the test data. Now what we do is randomly select observations from the training dataset and feed them to model 1 (M1), we also assume that initially, all the observations have an equal weight that means an equal probability of getting selected.

Remember in ensembling techniques the weak learners combine to make a strong model so here M1, M2, M3….Mn all are weak learners.

Since M1 is a weak learner, it will surely misclassify some of the observations. Now before feeding the observations to M2 what we do is update the weights of the observations which are wrongly classified. You can think of it as a bag that initially contains 10 different color balls but after some time some kid takes out his favorite color ball and put 4 red color balls instead inside the bag. Now off-course the probability of selecting a red ball is higher. This same phenomenon happens in Boosting techniques, when an observation is wrongly classified, its weight get’s updated and for those which are correctly classified, their weights get decreased. The probability of selecting a wrongly classified observation gets increased hence in the next model only those observations get selected which were misclassified in model 1.

Similarly, it happens with M2, the wrongly classified weights are again updated and then fed to M3. This procedure is continued until and unless the errors are minimized, and the dataset is predicted correctly. Now when the new datapoint comes in (Test data) it passes through all the models (weak learners) and the class which gets the highest vote is the output for our test data.

What is a Gradient boosting Algorithm?

The main idea behind this algorithm is to build models sequentially and these subsequent models try to reduce the errors of the previous model. But how do we do that? How do we reduce the error? This is done by building a new model on the errors or residuals of the previous model.

When the target column is continuous, we use Gradient Boosting Regressor whereas when it is a classification problem, we use Gradient Boosting Classifier. The only difference between the two is the “Loss function”. The objective here is to minimize this loss function by adding weak learners using gradient descent. Since it is based on loss function hence for regression problems, we’ll have different loss functions like Mean squared error (MSE) and for classification, we will have different for e.g log-likelihood.

Understand Gradient Boosting Algorithm with example

Let’s understand the intuition behind Gradient boosting with the help of an example. Here our target column is continuous hence we will use Gradient Boosting Regressor.

Following is a sample from a random dataset where we have to predict the car price based on various features. The target column is price and other features are independent features.

Image Source: Author

Step -1 The first step in gradient boosting is to build a base model to predict the observations in the training dataset. For simplicity we take an average of the target column and assume that to be the predicted value as shown below:

Image Source: Author

Why did I say we take the average of the target column? Well, there is math involved behind this. Mathematically the first step can be written as:

Looking at this may give you a headache, but don’t worry we will try to understand what is written here.

Here L is our loss function

Gamma is our predicted value

argmin means we have to find a predicted value/gamma for which the loss function is minimum.

Since the target column is continuous our loss function will be:

Here yi is the observed value

And gamma is the predicted value

Now we need to find a minimum value of gamma such that this loss function is minimum. We all have studied how to find minima and maxima in our 12th grade. Did we use to differentiate this loss function and then put it equal to 0 right? Yes, we will do the same here.

Let’s see how to do this with the help of our example. Remember that y_i is our observed value and gamma_i is our predicted value, by plugging the values in the above formula we get:

We end up over an average of the observed car price and this is why I asked you to take the average of the target column and assume it to be your first prediction.

Hence for gamma=14500, the loss function will be minimum so this value will become our prediction for the base model.

Step-2 The next step is to calculate the pseudo residuals which are (observed value – predicted value)

Image Source: Author

Here F(xi) is the previous model and m is the number of DT made.

We are just taking the derivative of loss function w.r.t the predicted value and we have already calculated this derivative:

If you see the formula of residuals above, we see that the derivative of the loss function is multiplied by a negative sign, so now we get:

The predicted value here is the prediction made by the previous model. In our example the prediction made by the previous model (initial base model prediction) is 14500, to calculate the residuals our formula becomes:

In the next step, we will build a model on these pseudo residuals and make predictions. Why do we do this? Because we want to minimize these residuals and minimizing the residuals will eventually improve our model accuracy and prediction power. So, using the Residual as target and the original feature Cylinder number, cylinder height, and Engine location we will generate new predictions. Note that the predictions, in this case, will be the error values, not the predicted car price values since our target column is an error now.

Let’s say hm(x) is our DT made on these residuals.

Step- 4 In this step we find the output values for each leaf of our decision tree. That means there might be a case where 1 leaf gets more than 1 residual, hence we need to find the final output of all the leaves. TO find the output we can simply take the average of all the numbers in a leaf, doesn’t matter if there is only 1 number or more than 1.

Let’s see why do we take the average of all the numbers. Mathematically this step can be represented as:

Here hm(xi) is the DT made on residuals and m is the number of DT. When m=1 we are talking about the 1st DT and when it is “M” we are talking about the last DT.

The output value for the leaf is the value of gamma that minimizes the Loss function. The left-hand side “Gamma” is the output value of a particular leaf. On the right-hand side [Fm-1(xi)+ƴhm(xi))] is similar to step 1 but here the difference is that we are taking previous predictions whereas earlier there was no previous prediction.

Image Source

We see 1st residual goes in R1,1  ,2nd and 3rd residuals go in R2,1 and 4th residual goes in R3,1 .

Let’s calculate the output for the first leave that is R1,1

Now we need to find the value for gamma for which this function is minimum. So we find the derivative of this equation w.r.t gamma and put it equal to 0.

Hence the leaf R1,1 has an output value of -2500. Now let’s solve for the R2,1

Let’s take the derivative to get the minimum value of gamma for which this function is minimum:

We end up with the average of the residuals in the leaf R2,1 . Hence if we get any leaf with more than 1 residual, we can simply find the average of that leaf and that will be our final output.

Now after calculating the output of all the leaves, we get:

Image Source: Author

Step-5 This is finally the last step where we have to update the predictions of the previous model. It can be updated as:

where m is the number of decision trees made.

Since we have just started building our model so our m=1. Now to make a new DT our new predictions will be:

Image Source: Author

Here Fm-1(x) is the prediction of the base model (previous prediction) since F1-1=0 , F0 is our base model hence the previous prediction is 14500.

nu is the learning rate that is usually selected between 0-1. It reduces the effect each tree has on the final prediction, and this improves accuracy in the long run. Let’s take nu=0.1 in this example.

Hm(x) is the recent DT made on the residuals.

Let’s calculate the new prediction now:

Image Source: Author

Suppose we want to find a prediction of our first data point which has a car height of 48.8. This data point will go through this decision tree and the output it gets will be multiplied with the learning rate and then added to the previous prediction.

Now let’s say m=2 which means we have built 2 decision trees and now we want to have new predictions.

This time we will add the previous prediction that is F1(x) to the new DT made on residuals. We will iterate through these steps again and again till the loss is negligible.

Image Source: Author

If a new data point says height = 1.40 comes, it’ll go through all the trees and then will give the prediction. Here we have only 2 trees hence the datapoint will go through these 2 trees and the final output will be F2(x).

What is Gradient Boosting Classifier?

A gradient boosting classifier is used when the target column is binary. All the steps explained in the Gradient boosting regressor are used here, the only difference is we change the loss function. Earlier we used Mean squared error when the target column was continuous but this time, we will use log-likelihood as our loss function.

Let’s see how this loss function works, to read more about log-likelihood I recommend you to go through this article where I have given each detail you need to understand this.

The loss function for the classification problem is given below:

Our first step in the gradient boosting algorithm was to initialize the model with some constant value, there we used the average of the target column but here we’ll use log(odds) to get that constant value. The question comes why log(odds)?

When we differentiate this loss function, we will get a function of log(odds) and then we need to find a value of log(odds) for which the loss function is minimum.

Confused right? Okay let’s see how it works:

Let’s first transform this loss function so that it is a function of log(odds), I’ll tell you later why we did this transformation.

Now this is our loss function, and we need to minimize it, for this, we take the derivative of this w.r.t to log(odds) and then put it equal to 0,

Here y are the observed values

You must be wondering that why did we transform the loss function into the function of log(odds). Actually, sometimes it is easy to use the function of log(odds), and sometimes it’s easy to use the function of predicted probability “p”.

It is not compulsory to transform the loss function, we did this just to have easy calculations.

Hence the minimum value of this loss function will be our first prediction (base model prediction)

Now in the Gradient boosting regressor our next step was to calculate the pseudo residuals where we multiplied the derivative of the loss function with -1. We will do the same but now the loss function is different, and we are dealing with the probability of an outcome now.

After finding the residuals we can build a decision tree with all independent variables and target variables as “Residuals”.

Now when we have our first decision tree, we find the final output of the leaves because there might be a case where a leaf gets more than 1 residuals, so we need to calculate the final output value. The math behind this step is out of the scope of this article so I will mention the direct formula to calculate the output of a leaf:

Finally, we are ready to get new predictions by adding our base model with the new tree we made on residuals.

There are a few variations of gradient boosting and a couple of them are momentarily clarified in the coming article.

Implementation Using scikit-learn

The task here is to classify the income of an individual, when given the required inputs about his personal life.

First, let’s import all required libraries.

# Import all relevant libraries from sklearn.ensemble import GradientBoostingClassifier import numpy as np import pandas as pd from sklearn.preprocessing import StandardScaler from sklearn.model_selection import train_test_split from sklearn.metrics import accuracy_score, confusion_matrix from sklearn import preprocessing import warnings warnings.filterwarnings("ignore") Now let’s read the dataset and look at the columns to understand the information better. df = pd.read_csv('income_evaluation.csv') df.head()

I have already done the data preprocessing part and you can look whole code chúng tôi my main aim is to tell you how to implement this on python. Now for training and testing our model, the data has to be divided into train and test data.

We will also scale the data to lie between 0 and 1.

# Split dataset into test and train data X_train, X_test, y_train, y_test = train_test_split(df.drop(‘income’, axis=1),df[‘income’], test_size=0.2)

Now let’s go ahead with defining the Gradient Boosting Classifier along with it’s hyperparameters. Next, we will fit this model on the training data.

# Define Gradient Boosting Classifier with hyperparameters gbc=GradientBoostingClassifier(n_estimators=500,learning_rate=0.05,random_state=100,max_features=5 ) # Fit train data to GBC,y_train)

The model has been trained and we can now observe the outputs as well.

Below, you can see the confusion matrix of the model, which gives a report of the number of classifications and misclassifications.

# Confusion matrix will give number of correct and incorrect classifications print(confusion_matrix(y_test, gbc.predict(X_test))) # Accuracy of model print("GBC accuracy is %2.2f" % accuracy_score( y_test, gbc.predict(X_test)))

Let’s check the classification report also:

from sklearn.metrics import classification_report pred=gbc.predict(X_test) print(classification_report(y_test, pred)) Parameter Tuning in Gradient Boosting (GBM) in Python Tuning n_estimators and Learning rate

n_estimators is the number of trees (weak learners) that we want to add in the model. There are no optimum values for learning rate as low values always work better, given that we train on sufficient number of trees. A high number of trees can be computationally expensive that’s why I have taken few number of trees here.

from sklearn.model_selection import GridSearchCV grid = {     'learning_rate':[0.01,0.05,0.1],     'n_estimators':np.arange(100,500,100), } gb = GradientBoostingClassifier() gb_cv = GridSearchCV(gb, grid, cv = 4),y_train) print("Best Parameters:",gb_cv.best_params_) print("Train Score:",gb_cv.best_score_) print("Test Score:",gb_cv.score(X_test,y_test))

We see the accuracy increased from 86 to 89 after tuning n_estimators and learning rate. Also the “true positive” and the “true negative” rate improved.

We can also tune max_depth parameter which you must have heard in decision trees and random forests.

grid = {'max_depth':[2,3,4,5,6,7] } gb = GradientBoostingClassifier(learning_rate=0.1,n_estimators=400) gb_cv = GridSearchCV(gb, grid, cv = 4),y_train) print("Best Parameters:",gb_cv.best_params_) print("Train Score:",gb_cv.best_score_) print("Test Score:",gb_cv.score(X_test,y_test))

The accuracy has increased even more when we tuned the parameter “max_depth”.

End Notes

I hope you got an understanding of how the Gradient Boosting algorithm works under the hood. I have tried to show you the math behind this is the easiest way possible.

In the next article, I will explain Xtreme Gradient Boosting (XGB), which is again a new technique to combine various models and to improve our accuracy score. It is just an extension of the gradient boost algorithm.

About the Author

I am an undergraduate student currently in my last year majoring in Statistics (Bachelors of Statistics) and have a strong interest in the field of data science, machine learning, and artificial intelligence. I enjoy diving into data to discover trends and other valuable insights about the data. I am constantly learning and motivated to try new things.

I am open to collaboration and work.

For any doubt and queries, feel free to contact me on Email

Connect with me on LinkedIn and Twitter

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.


You're reading Gradient Boosting Algorithm: A Complete Guide For Beginners

Variants Of Gradient Descent Algorithm


In a Neural Network, the Gradient Descent Algorithm is used during the backward propagation to update the parameters of the model. This article is completely focused on the variants of the Gradient Descent Algorithm in detail. Without any delay, let’s start!

Note: If you are more interested in learning concepts in an Audio-Visual format, We have this entire article explained in the video below. If not, you may continue reading.

Recap: What is the equation of the Gradient Descent Algorithm?

This is the updated equation for the Gradient Descent algorithm-

Here θ is the parameter we wish to update, dJ/dθ is the partial derivative which tells us the rate of change of error on the cost function with respect to the parameter θ and α here is the Learning Rate. I hope you are familiar with these terms, if not then I would recommend you to first go through this article on Understanding Gradient Descent Algorithm.

So, this J here represents the cost function and there are multiple ways to calculate this cost. Based on the way we are calculating this cost function there are different variants of Gradient Descent.

Batch Gradient Descent

Let’s say there are a total of ‘m’ observations in a data set and we use all these observations to calculate the cost function J, then this is known as Batch Gradient Descent.

So we take the entire training set, perform forward propagation and calculate the cost function. And then we update the parameters using the rate of change of this cost function with respect to the parameters. An epoch is when the entire training set is passed through the model, forward propagation and backward propagation are performed and the parameters are updated. In batch Gradient Descent since we are using the entire training set, the parameters will be updated only once per epoch.

Stochastic Gradient Descent(SGD)

If you use a single observation to calculate the cost function it is known as Stochastic Gradient Descent, commonly abbreviated as SGD. We pass a single observation at a time, calculate the cost and update the parameters.

Let’s say we have 5 observations and each observation has three features and the values that I’ve taken are completely random.

Now if we use the SGD, will take the first observation, then pass it through the neural network, calculate the error and then update the parameters.

Then will take the second observation and perform similar steps with it. This step will be repeated until all observations have been passed through the network and the parameters have been updated.

Each time the parameter is updated, it is known as an Iteration. Here since we have 5 observations, the parameters will be updated 5 times or we can say that there will be 5 iterations. Had this been the Batch Gradient Descent we would have passed all the observations together and the parameters have been updated only once. In the case of SGD, there will be ‘m’ iterations per epoch, where ‘m’ is the number of observations in a dataset.

So far we’ve seen that if we use the entire dataset to calculate the cost function, it is known as Batch Gradient Descent and if use a single observation to calculate the cost it is known as SGD.

Mini-batch Gradient Descent

Another type of Gradient Descent is the Mini-batch Gradient Descent. It takes a subset of the entire dataset to calculate the cost function. So if there are ‘m’ observations then the number of observations in each subset or mini-batches will be more than 1 and less than ‘m’.

Again let’s take the same example. Assume that the batch size is 2. So we’ll take the first two observations, pass them through the neural network, calculate the error and then update the parameters.

Then we will take the next two observations and perform similar steps i.e will pass through the network, calculate the error and update the parameters.

Now since we’re left with the single observation in the final iteration, there will be only a single observation and will update the parameters using this observation.

Comparison between Batch GD, SGD, and Mini-batch GD:

This is a brief overview of the different variants of Gradient Descent. Now let’s compare these different types with each other:

Comparison: Number of observations used for Updation

In the case of Stochastic Gradient Descent, we update the parameters after every single observation and we know that every time the weights are updated it is known as an iteration.

In the case of Mini-batch Gradient Descent, we take a subset of data and update the parameters based on every subset.

Comparison: Cost function

Now since we update the parameters using the entire data set in the case of the Batch GD, the cost function, in this case, reduces smoothly.

On the other hand, this updation in the case of SGD is not that smooth. Since we’re updating the parameters based on a single observation, there are a lot of iterations. It might also be possible that the model starts learning noise as well.

The updation of the cost function in the case of Mini-batch Gradient Descent is smoother as compared to that of the cost function in SGD. Since we’re not updating the parameters after every single observation but after every subset of the data.

Comparison: Computation Cost and Time

Now coming to the computation cost and time taken by these variants of Gradient Descent. Since we’ve to load the entire data set at a time, perform the forward propagation on that and calculate the error and then update the parameters, the computation cost in the case of Batch gradient descent is very high.

Computation cost in the case of SGD is less as compared to the Batch Gradient Descent since we’ve to load every single observation at a time but the Computation time here increases as there will be more number of updates which will result in more number of iterations.

In the case of Mini-batch Gradient Descent, taking a subset of the data there are a lesser number of iterations or updations and hence the computation time in the case of mini-batch gradient descent is less than SGD. Also, since we’re not loading the entire dataset at a time whereas loading a subset of the data, the computation cost is also less as compared to the Batch gradient descent. This is the reason why people usually prefer using Mini-batch gradient descent. Practically whenever we say Stochastic Gradient Descent we generally refer to Mini-batch Gradient Descent.

Here is the complete Comparison Chart:

End Notes

In this video, we saw the variants of the Gradient Descent Algorithm in detail. We also compared all of them with each other and found that Mini-batch GD is the most commonly used variant of the Gradient Descent.

If you are looking to kick start your Data Science Journey and want every topic under one roof, your search stops here. Check out Analytics Vidhya’s Certified AI & ML BlackBelt Plus Program


A Comprehensive Guide To Sharding In Data Engineering For Beginners

This article was published as a part of the Data Science Blogathon.

Big Data is a very commonly heard term these days. A reasonably large volume of data that cannot be handled on a small capacity configuration of servers can be called ‘Big Data’ in that particular context. In today’s competitive world, every business organization relies on decision-making based on the outcome of the analyzed data they have on hand. The data pipeline starting from the collection of raw data to the final deployment of machine learning models based on this data goes through the usual steps of cleaning, pre-processing, processing, storage, model building, and analysis. Efficient handling and accuracy depend on resources like software, hardware, technical workforce, and costs. Answering queries requires specific data probing in either static or dynamic mode with consistency, reliability, and availability. When data is large, inadequacy in handling queries due to the size of data and low capacity of machines in terms of speed, memory may prove problematic for the organization. This is where sharding steps in to address the above problems.

This guide explores the basics and various facets of data sharding, the need for sharding, and its pros, and cons.

What is Data Sharding?

With the increasing use of IT technologies, data is accumulating at an overwhelmingly faster pace. Companies leverage this big data for data-driven decision-making. However, with the increased size of the data, system performance suffers due to queries becoming very slow if the dataset is entirely stored in a single database. This is why data sharding is required.

Image Source: Author

In simple terms, sharding is the process of dividing and storing a single logical dataset into databases that are distributed across multiple computers. This way, when a query is executed, a few computers in the network may be involved in processing the query, and the system performance is faster. With increased traffic, scaling the databases becomes non-negotiable to cope with the increased demand. Furthermore, several sharding solutions allow for the inclusion of additional computers. Sharding allows a database cluster to grow with the amount of data and traffic received.

Let’s look at some key terms used in the sharding of databases.

Scale-out and Scaling up: The process of creating or removing databases horizontally done to improve performance and increase capacity is called scale-out. Scaling up refers to the practice of adding physical resources to an existing database server, like memory, storage, and CPU, to improve performance.

Sharding: Sharding distributes similarly-formatted large data over several separate databases.

Chunk: A chunk is made up of sharded data subset and is bound by lower and higher ranges based on the shard key.

Shard: A shard is a horizontally distributed portion of data in a database. Data collections with the same partition keys are called logical shards, which are then distributed across separate database nodes.

Sharding Key: A sharding key is a column of the database to be sharded. This key is responsible for partitioning the data. It can be either a single indexed column or multiple columns denoted by a value that determines the data division between the shards. A primary key can be used as a sharding key. However, a sharding key cannot be a primary key. The choice of the sharding key depends on the application. For example, userID could be used as a sharding key in banking or social media applications.

Logical shard and Physical Shard: A chunk of the data with the same shard key is called a logical shard. When a single server holds one or more than one logical shard, it is called a physical shard.

Shard replicas: These are the copies of the shard and are allotted to different nodes.

Partition Key: It is a key that defines the pattern of data distribution in the database. Using this key, it is possible to direct the query to the concerned database for retrieving and manipulating the data. Data having the same partition key is stored in the same node.

Replication: It is a process of copying and storing data from a central database at more than one node.

Resharding: It is the process of redistributing the data across shards to adapt to the growing size of data.

Are Sharding and Partitioning the same?

Both Sharding and Partitioning allow splitting and storing the databases into smaller datasets. However, they are not the same. Upon comparison, we can say that sharding distributes the data and is shared over several machines, but not with partitioning. Within a single unsharded database, partitioning is the process of grouping subsets of data. Hence, the phrases sharding and partitioning are used interchangeably when the terms “horizontal” and “vertical” are used before them. As a result, “horizontal sharding” and “horizontal partitioning” are interchangeable terms.

Vertical Sharding:

Entire columns are split and placed in new, different tables in a vertically partitioned table. The data in one vertical split is different from the data in the others, and each contains distinct rows and columns.

Horizontal Sharding:

Horizontal sharding or horizontal partitioning divides a table’s rows into multiple tables or partitions. Every partition has the same schema and columns but distinct rows. Here, the data stored in each partition is distinct and independent of the data stored in other partitions.

The image below shows how a table can be partitioned both horizontally and vertically.

The Process

Before sharding a database, it is essential to evaluate the requirements for selecting the type of sharding to be implemented.

At the start, we need to have a clear idea about the data and how the data will be distributed across shards. The answer is crucial as it will directly impact the performance of the sharded database and its maintenance strategy.

Next, the nature of queries that need to be routed through these shards should also be known. For read queries, replication is a better and more cost-effective option than sharding the database. On the other hand, workload involving writing queries or both read and write queries would require sharding of the database. And the final point to be considered is regarding shard maintenance. As the accumulated data increases, it needs to be distributed, and the number of shards keeps on growing over time. Hence, the distribution of data in shards requires a strategy that needs to be planned ahead to keep the sharding process efficient.

Types of Sharding Architectures

Once you have decided to shard the existing database, the following step is to figure out how to achieve it. It is crucial that during query execution or distributing the incoming data to sharded tables/databases, it goes to the proper shard. Otherwise, there is a possibility of losing the data or experiencing noticeably slow searches. In this section, we will look at some commonly used sharding architectures, each of which has a distinct way of distributing data between shards. There are three main types of sharding architectures – Key or Hash-Based, Range Based, and Directory-Based sharding.

To understand these sharding strategies, say there is a company that handles databases for its client who sell their products in different countries. The handled database might look like this and can often extend to more than a million rows.

We will take a few rows from the above table to explain each sharding strategy.

So, to store and query these databases efficiently, we need to implement sharding on these databases for low latency, fault tolerance, and reliability.

Key Based Sharding

Key Based Sharding or Hash-Based Sharding, uses a value from the column data — like customer ID, customer IP address, a ZIP code, etc. to generate a hash value to shard the database. This selected table column is the shard key. Next, all row values in the shard key column are passed through the hash function.

This hash function is a mathematical function that converts any text input size (usually a combination of numbers and strings) and returns a unique output called a hash value. The hash value is based on the chosen algorithm (depending on the data and application) and the total number of available shards. This value indicates the data should be sent to which shard number.

It is important to remember that a shard key needs to be both unique and static, i.e., it should not change over a period of time. Otherwise, it would increase the amount of work required for update operations, thus slowing down performance.

The Key Based Sharding process looks like this:

Image Source: Author

Features of Key Based Sharding are-

It is easier to generate hash keys using algorithms. Hence, it is good at load balancing since data is equally distributed among the available numbers of shards.

As all shards share the same load, it helps to avoid database hotspots (when one shard contains excessive data as compared to the rest of the shards).

Additionally, in this type of sharding, there is no need to have any additional map or table to hold the information of where the data is stored.

However, it is not dynamic sharding, and it can be difficult to add or remove extra servers from a database depending on the application requirement. The adding or removing of servers requires recalculating the hash key. Since the hash key changes due to a change in the number of shards, all the data needs to be remapped and moved to the appropriate shard number. This is a tedious task and often challenging to implement in a production environment.

To address the above shortcoming of Key Based Sharding, a ‘Consistent Hashing’ strategy can be used.

Consistent Hashing-

In this strategy, hash values are generated both for the data input and the shard, based on the number generated for the data and the IP address of the shard machine, respectively. These two hash values are arranged around a ring or a circle utilizing the 360 degrees of the circle. The hash values that are close to each other are made into a pair, which can be done either clockwise or anti-clockwise.

The data is loaded according to this combination of hash values. Whenever the shards need to be reduced, the values from where the shard has been removed are attached to the nearest shard. A similar procedure is adopted when a shard is added. The possibility of mapping and reorganization problems in the Hash Key strategy is removed in this way as the mapping of the number of shards is reduced noticeably. For example, in Key Based Hashing, if you are required to shuffle the data to 3 out of 4 shards due to a change in the hash function, then in ‘consistent hashing,’ you will require shuffling on a lesser number of shards as compared to the previous one. Moreover, any overloading problem is taken care of by adding replicas of the shard.

Range Based Sharding

Range Based Sharding is the process of sharding data based on value ranges. Using our previous database example, we can make a few distinct shards using the Order value amount as a range (lower value and higher value) and divide customer information according to the price range of their order value, as seen below:

Image source: Author

Features of Range Based Sharding are-

Besides, there is no hashing function involved. Hence, it is possible to easily add more machines or reduce the number of machines. And there is no need to shuffle or reorganize the data.

On the other hand, this type of sharding does not ensure evenly distributed data. It can result in overloading a particular shard, commonly referred to as a database hotspot.

Directory-Based Sharding

This type of sharding relies on a lookup table (with the specific shard key) that keeps track of the stored data and the concerned allotted shards. It tells us exactly where the particular queried data is stored or located on a specific shard. This lookup table is maintained separately and does not exist on the shards or the application. The following image demonstrates a simple example of a Directory-Based Sharding.

Features of Directory-Based Sharding are –

The directory-Based Sharding strategy is highly adaptable. While Range Based Sharding is limited to providing ranges of values, Key Based Sharding is heavily dependent on a fixed hash function, making it challenging to alter later if application requirements change. Directory-Based Sharding enables us to use any method or technique to allocate data entries to shards, and it is convenient to add or reduce shards dynamically.

The only downside of this type of sharding architecture is that there is a need to connect to the lookup table before each query or write every time, which may increase the latency.

Furthermore, if the lookup table gets corrupted, it can cause a complete failure at that instant, known as a single point of failure. This can be overcome by ensuring the security of the lookup table and creating a backup of the table for such events.

Other than the three main sharding strategies discussed above, there can be many more sharding strategies that are usually a combination of these three.

After this detailed sharding architecture overview, we will now understand the pros and cons of sharding databases.

Benefits of Sharding

Horizontal Scaling: For any non-distributed database on a single server, there will always be a limit to storage and processing capacity. The ability of sharding to extend horizontally makes the arrangement flexible to accommodate larger amounts of data.

Speed: Speed is one more reason why sharded database design is preferred is to improve query response times. Upon submitting a query to a non-sharded database, it likely has to search every row in the table before finding the result set, you’re searching for. Queries can become prohibitively slow in an application with an unsharded single database. However, by sharding a single table into multiple tables, queries pass through fewer rows, and their resulting values are delivered considerably faster.

Reliability: Sharding can help to improve application reliability by reducing the effect of system failures. If a program or website is dependent on an unsharded database, a failure might render the entire application inoperable. An outage in a sharded database, on the other hand, is likely to affect only one shard. Even if this causes certain users to be unable to access some areas of the program or website, the overall impact would be minimal.

Challenges in Sharding

While sharding a database might facilitate growth and enhance speed, it can also impose certain constraints. We will go through some of them and why they could be strong reasons to avoid sharding entirely.

Increased complexity: Companies usually face a challenge of complexity when designing shared database architecture. There is a risk that the sharding operation will result in lost data or damaged tables if done incorrectly. Even if done correctly, shard maintenance and organization are likely to significantly influence the outcome.

Shard Imbalancing: Depending on the sharding architecture, distribution on different shards can get imbalanced due to incoming traffic. This results in remapping or reorganizing the data amongst different shards. Obviously, it is time-consuming and expensive.

Unsharding or restoring the database: Once a database has been sharded, it can be complicated to restore it to its earlier version. Backups of the database produced before it was sharded will not include data written after partitioning. As a result, reconstructing the original unsharded architecture would need either integrating the new partitioned data with the old backups or changing the partitioned database back into a single database, both of which would undesirable.

Not supported by all databases: It is to be noted that not every database engine natively supports sharding. There are several databases currently available. Some popular ones are MySQL, PostgreSQL, Cassandra, MongoDB, HBase, Redis, and more. Databases namely MySQL or MongoDB has an auto-sharding feature. As a result, we need to customize the strategy to suit the application requirements when using different databases for sharding.

Now that we have discussed the pros and cons of sharding databases, let us explore situations when one should select sharding.

When should one go for Sharding?

When the application data outgrows the storage capacity and can no longer be stored as a single database, sharding becomes essential.

When the volume of reading/writing exceeds the current database capacity, this results in a higher query response time or timeouts.

When a slow response is experienced while reading the read replicas, it indicates that the network bandwidth required by the application is higher than the available bandwidth.

Excluding the above situations, it is possible to optimize the database instead. These could include carrying out server upgrades, implementing caches, creating one or more read replicas, setting up remote databases, etc. Only when these options cannot solve the problem of increased data, sharding is always an option.


We have covered the fundamentals of sharding in Data Engineering and by now have developed a good understanding of this topic

With sharding, businesses can add horizontal scalability to the existing databases. However, it comes with a set of challenges that need to be addressed. These include considerable complexity, possible failure points, a requirement for additional resources, and more. Thus, sharding is essential only in certain situations.

I hope you enjoyed reading this guide! In the next part of this guide, we will cover how sharding is implemented step-by-step using MongoDB.

Author Bio

She loves traveling, reading fiction, solving Sudoku puzzles, and participating in coding competitions in her leisure time.

You can follow her on LinkedIn, GitHub, Kaggle, Medium, Twitter.

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.


Beginner’s Guide To Image Gradient

So, the gradient helps us measure how the image changes and based on sharp changes in the intensity levels; it detects the presence of an edge. We will dive deep into it by manually computing the gradient in a moment.

Why do we need an image gradient?

Image gradient is used to extract information from an image. It is one of the fundamental building blocks in image processing and edge detection. The main application of image gradient is in edge detection. Many algorithms, such as Canny Edge Detection, use image gradients for detecting edges.

Mathematical Calculation of Image gradients

Enough talking about gradients, Let’s now look at how we compute gradients manually. Let’s take a 3*3 image and try to find an edge using an image gradient. We will start by taking a center pixel around which we want to detect the edge. We have 4 main neighbors of the center pixel, which are:

(iv) P(x,y+1) bottom pixel

We will subtract the pixels opposite to each other i.e. Pbottom – Ptop and Pright – Pleft , which will give us the change in intensity or the contrast in the level of intensity of the opposite the pixel.

Change of intensity in the X direction is given by:

Gradient in Y direction = PR - PL

Change of intensity in the Y direction is given by:

Gradient in Y direction = PB - PT

Gradient for the image function is given by:

𝛥I = [𝛿I/𝛿x, 𝛿I/𝛿y]

Let us find out the gradient for the given image function:

We can see from the image above that there is a change in intensity levels only in the horizontal direction and no change in the y direction. Let’s try to replicate the above image in a 3*3 image, creating it manually-

Let us now find out the change in intensity level for the image above

GX = PR - PL Gy = PB - PT GX = 0-255 = -255 Gy = 255 - 255 = 0

𝛥I = [ -255, 0]

Let us take another image to understand the process clearly.

Let us now try to replicate this image using a grid system and create a similar 3 * 3 image.

Now we can see that there is no change in the horizontal direction of the image

GX = PR - PL , Gy = PB - PT GX = 255 - 255 = 0 Gy = 0 - 255 = -255

𝛥I = [0, -255]

But what if there is a change in the intensity level in both the direction of the image. Let us take an example in which the image intensity is changing in both the direction

Let us now try replicating this image using a grid system and create a similar 3 * 3 image.

GX = PR - PL , Gy = PB - PT GX = 0 - 255 = -255 Gy = 0 - 255 = -255

𝛥I = [ -255, -255]

Now that we have found the gradient values, let me introduce you to two new terms:

Gradient magnitude

Gradient orientation

Gradient magnitude represents the strength of the change in the intensity level of the image. It is calculated by the given formula:

Gradient Magnitude: √((change in x)² +(change in Y)²)

The higher the Gradient magnitude, the stronger the change in the image intensity

Gradient Orientation represents the direction of the change of intensity levels in the image. We can find out gradient orientation by the formula given below:

Gradient Orientation: tan-¹( (𝛿I/𝛿y) / (𝛿I/𝛿x)) * (180/𝝅) Overview of Filters

We have learned to calculate gradients manually, but we can’t do that manually each time, especially with large images. We can find out the gradient of any image by convoluting a filter over the image. To find the orientation of the edge, we have to find the gradient in both X and Y directions and then find the resultant of both to get the very edge.

Different filters or kernels can be used for finding gradients, i.e., detecting edges.

3 filters that we will be working on in this article are

Roberts filter

Prewitt filter

Sobel filter

All the filters can be used for different purposes. All these filters are similar to each other but different in some properties. All these filters have horizontal and vertical edge detecting filters.

These filters differ in terms of the values orientation and size

Roberts Filter

Suppose we have this 4*4 image

Let us look at the computation

The gradient in x-direction =

Gx = 100 *1 + 200*0 + 150*0 - 35*1 Gx = 65

The gradient in y direction =

Gy = 100 *0 + 200*1 - 150*1 + 35*0 Gy = 50

Now that we have found out both these values, let us calculate gradient strength and gradient orientation.

Gradient magnitude = √(Gx)² + (Gy)² = √(65)² + (50)² = √6725 ≅ 82

We can use the arctan2 function of NumPy to find the tan-1 in order to find the gradient orientation

Gradient Orientation = np.arctan2( Gy / Gx) * (180/ 𝝅) = 37.5685 Prewitt Filter

Prewitt filter is a 3 * 3 filter and it is more sensitive to vertical and horizontal edges as compared to the Sobel filter. It detects two types of edges – vertical and horizontal. Edges are calculated by using the difference between corresponding pixel intensities of an image.

A working example of Prewitt filter

Suppose we have the same 4*4 image as earlier

Let us look at the computation

The gradient in x direction =

Gx = 100 *(-1) + 200*0 + 100*1 + 150*(-1) + 35*0 + 100*1 + 50*(-1) + 100*0 + 200*1 Gx = 100

The gradient in y direction =

Gy = 100 *1 + 200*1 + 200*1 + 150*0 + 35*0 +100*0 + 50*(-1) + 100*(-1) + 200*(-1) Gy = 150

Now that we have found both these values let us calculate gradient strength and gradient orientation.

Gradient magnitude = √(Gx)² + (Gy)² = √(100)² + (150)² = √32500 ≅ 180

We will use the arctan2 function of NumPy to find the gradient orientation

Gradient Orientation = np.arctan2( Gy / Gx) * (180/ 𝝅) = 56.3099 Sobel Filter

Sobel filter is the same as the Prewitt filter, and just the center 2 values are changed from 1 to 2 and -1 to -2 in both the filters used for horizontal and vertical edge detection.

A working example of Sobel filter

Suppose we have the same 4*4 image as earlier

Let us look at the computation

The gradient in x direction =

Gx = 100 *(-1) + 200*0 + 100*1 + 150*(-2) + 35*0 + 100*2 + 50*(-1) + 100*0 + 200*1 Gx = 50

The gradient in y direction =

Gy = 100 *1 + 200*2 + 100*1 + 150*0 + 35*0 +100*0 + 50*(-1) + 100*(-2) + 200*(-1) Gy = 150

Now that we have found out both these values, let us calculate gradient strength and gradient orientation.

Gradient magnitude = √(Gx)² + (Gy)² = √(50)² + (150)² = √ ≅ 58

Using the arctan2 function of NumPy to find the gradient orientation

Gradient Orientation = np.arctan2( Gy / Gx) * (180/ 𝝅) = 71.5650 Implementation using OpenCV

We will perform the program on a very famous image known as Lenna.

Let us start by installing the OpenCV package

#installing opencv !pip install cv2

After we have installed the package, let us import the package and other libraries

Using Roberts filter

Python Code:

Using Prewitt Filter #Converting image to grayscale gray_img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) #Creating Prewitt filter kernelx = np.array([[1,1,1],[0,0,0],[-1,-1,-1]]) kernely = np.array([[-1,0,1],[-1,0,1],[-1,0,1]]) #Applying filter to the image in both x and y direction img_prewittx = cv2.filter2D(img, -1, kernelx) img_prewitty = cv2.filter2D(img, -1, kernely) # Taking root of squared sum(np.hypot) from both the direction and displaying the result prewitt = np.hypot(img_prewitty,img_prewittx) prewitt = prewitt[:,:,0] prewitt = prewitt.astype('int') plt.imshow(prewitt,cmap='gray')


Using Sobel filter gray_img = cv2.cvtColor(img,cv2.COLOR_BGR2GRAY) kernelx = np.array([[-1,0,1],[-2,0,2],[-1,0,1]]) kernely = np.array([[1, 2, 1],[0, 0, 0],[-1,-2,-1]]) img_x = cv2.filter2D(gray_img, -1, kernelx) img_y = cv2.filter2D(gray_img, -1, kernely) #taking root of squared sum and displaying result new=np.hypot(img_x,img_y) plt.imshow(new.astype('int'),cmap='gray')



This article taught us the basics of Image gradient and its application in edge detection. Image gradient is one of the fundamental building blocks of image processing. It is the directional change in the intensity of the image. The main application of image gradient is in edge detection. Finding the change in intensity can conclude that it can be a boundary of an object. We can compute the gradient, its magnitude, and the orientation of the gradient manually. We usually use filters, which are of many kinds and for different results and purposes. The filters discussed in this article are the Roberts filter, Prewitt filter, and Sobel filter. We implemented the code in OpenCV, using all these 3 filters for computing gradient and eventually finding the edges.

Some key points to be noted:

Image gradient is the building block of any edge detection algorithm.

We can manually find out the image gradient and the strength and orientation of the gradient.

We learned how to find the gradient and detect edges using different filters coded using OpenCV.

A Complete Guide To Pl/Sql With

Introduction to PL/SQL WITH

PL/SQL WITH clause is used for doing the subquery refactoring in complex queries that involve usage of the same result set of a particular query again and again. It is similar to the subqueries that are used in SQL but using the with a clause in the query reduces the complexity of debugging and execution of your query when it is very complex. The usage of with clause is also being done when we need a certain temporary table that will hold the data to be used by our query once or multiple times even without having any of the view or temporary table.

Start Your Free Software Development Course

Syntax of PL/SQL WITH

WITH clause can be used which any of the sql statements of SELECT, INSERT, UPDATE or DELETE. The syntax of WITH clause is given below:

WITH new temp table (list of new columns) as (statement for filling temp table), statement(s) which use the temp table;

In the above syntax, the new temp table is the temporary table that can be referenced for retrieving the temporary result that is acquired by executing the statement for filling the temp table. The list of the columns that are specified is the names of the columns of the temporary table which should be the same as that of the number column values retrieved from the statement for filling the temp table. The keyword is used to alter mentioning the list of columns inside the brackets for the temporary table. The other statements using the result of the new temp table should be specified after giving a comma after filling the query.

Execution of Statement Containing WITH Clause

The execution of the query statement containing the with clause gives the preference for calculating the result set of the clause which holds the with in it and retrieve the temporary result set in the new temp table.

After this, the statement or statements which are specified other than the with clause are executed. These statements can make the use of the new temp table in it as many times as required. They will be executed after retrieving the result in the new temp table.

The temporary table that is the new temp table is also called CTE which is a common table expression.

Usage of WITH Clause

There are many scenarios in which the usage of with clause may prove helpful in query execution and data management.

Here are some of the key cases where it can be used most probably to make efficient usage of executing and debugging the complex queries that are listed below:

When it is not possible for creating a view in a database for using it in query statements.

When we have to make use of the same result set multiple times inside the query statement.

When we have to use the recursion while retrieving the result set.

Examples of PL/SQL WITH

Given below are the examples of PL/SQL WITH:

Example #1

Suppose we have a table named employee_details which contains the records as shown in the output of the below query statement.


SELECT * FROM [employee_details]


Now suppose, that we have to find out how much does salary the owner of the fruits and vegetable stores is to all the employees working in that particular store. In that case, the total salary for each store can be calculated as follows by using the with a clause for storing temporary results.

WITH temp_table AS ( SELECT store_id as "store", SUM(salary) salary FROM employee_details WHERE store_id IS NOT NULL GROUP BY store_id ) SELECT store, salary as "storewise salary" FROM temp_table WHERE store IN ("VEGETABLES","FRUITS");


The execution of the above query statement will be done firstly for the with statement query to store the details of all the stores and their total salaries I the table temp_table. After this, the execution of the other statement of retrieving the results for only those stores whose store id column will contain vegetables and fruits will be done. The output of the execution of the above query statement is as shown below showing the two stores and the total salary to be given by the owner for each of the stores respectively.

Example #2

Using the temporary result of WITH clause multiple times.

Let’s take one more example to understand the usage of with clause. There is one more table containing alternative contact mobile numbers of the employees named contact_details. The contents of this table can be retrieved by using the following query statement.


SELECT * FROM [contact_details]

Now, consider that we need to retrieve the records from the employee_details table that have a salary greater than the average salary. Along with that we also have to retrieve the mobile number present in the contact details table for that corresponding employee. In this case, we have to make use of the temporary table created by the with clause twice. One will be while calculating the average salary and the other one while retrieving the result set of it for getting the column details of those employees.


WITH temp_table AS( SELECT f_name, l_name, contact_number, mobile_number, salary FROM employee_details INNER JOIN contact_details ON employee_details.employee_id = contact_details.employee_id ORDER BY f_name ) SELECT f_name as "First Name", l_name as "Last Name", contact_number as "Contact Number 1", mobile_number as "Contact Number 2", salary FROM temp_table WHERE



We can make use of the with a clause to get the result so that it can be used inside the particular query statement as a temporary table. The result set of the with clause can be referenced single or multiple times inside the same query statement. The with clause is most often used in complex query statements for efficient and easy execution and debugging of the query statement.

Recommended Articles

We hope that this EDUCBA information on “PL/SQL WITH” was beneficial to you. You can view EDUCBA’s recommended articles for more information.

Complete Guide To A Decentralized Exchange – Pancakeswap

This article was published as a part of the Data Science Blogathon.


our assets. Traditionally, we use a centralized exchange to trade our assets. These centralized exchanges are the middlemen. Our crypto assets are stored on these exchanges, and the exchanges have control over our keys. If these exchanges decide to suspend withdrawals or any features on the exchange, our assets will be stuck on the exchange. This was a similar case that happened to exchanges such as Celsius, Voyager, and Vauld. Due to this, many users’ funds to date are stuck in these exchanges.

Blockchain resolves the above issues by introducing the concept of decentralization. Blockchain technology eliminates the middlemen, giving users control of their assets.

In this guide, I will walk you through how to use one of the most popular decentralized exchanges in the space – PancakeSwap.

What is a Decentralized Exchange (DEX)?

Decentralization brings power into the hands of the people. A decentralized exchange is a decentralized application (dapp) built on a blockchain that allows users to trade their assets in a decentralized manner. As a result, a user does not have to deposit their funds on the exchange. A DEX places control in the hands of the user. All the transactions that a user wants to perform take place via a crypto wallet, and for these transactions to take place, a user must pay fees which is very little compared to a centralized exchange. Hence, users can trade assets without actually depositing these assets on the exchange.

Every DEX is built on a blockchain. For example, UniSwap is a DEX built on the Ethereum blockchain, Raydium is a DEX built on the Solana blockchain, and PancakeSwap is a DEX built on the balance smart chain network. For interacting with a DEX, some fees are required. These fees are in very small amounts as compared to a CEX. The fees are paid in the underlying blockchain native token.

For example, to interact with UniSwap, a user is supposed to pay fees in ETH. For Raydium a user is supposed to pay fees in SOL and for PancakeSwap, a user is supposed to pay fees in BNB.

What is PancakeSwap?

PancakeSwap is a decentralized exchange that is built on the Binance Smart Chain Network (BSC). PancakeSwap allows users to easily swap, stake, farm their crypto assets, participate in contests, and offer many more features.

To access PancakeSwap, a user must use a wallet connected to the Binance Smart Chain Network.

Furthermore, Users are supposed to pay fees in the form of BNB tokens, to perform any transaction on the dapp.

Wallet Used in PancakeSwap

As specified above, we are supposed to use a crypto wallet to interact with a decentralized application. In this guide, I will use the MetaMask web extension wallet to perform any transactions with the DEX.

MetaMask is a crypto wallet where users can send, receive, swap, store, and buy digital assets easily and securely.

Connect your Web3 Wallet to the Dapp

The first step to accessing a DEX is to connect your crypto wallet; here, in our case, the MetaMask wallet. You can connect your crypto wallet by following the steps below:

Select your preferred wallet.

Your wallet will now open on your browser. Select which account of your wallet you want to connect to the DEX.

Once connected, you will notice your wallet address on the home page as well as in the top right-hand corner of the application.

Your wallet is now connected to the DEX. Hence you can now interact with the DEX features.

Swap Tokens

Swapping refers to the process of converting one token to another. For example, converting BNB to ETH.

Users can swap their tokens the following way:

Since we are on the home page, hover over the Trade option in the navbar.

Select the Swap option.

Select or search for the token with which you want to swap the respective token.

Enter the amount of the token you want to swap.

Once the swap is complete, in your wallet, you will notice a decrease in the quantity of the tokens you swapped with and an increase in the quantity of the tokens you swapped it for.

Limit Orders

The limit option in pancake swap is the same as in trading. Limit here refers to a particular price you want to swap the respective token. You can set the price you want to perform the swap in the limit option. Only if the price is met then will the transaction get executed.

For example, you can set the limit price to $200 to swap BNB for BUSD. Only if the price of BNB reaches $200 will the transaction take place.

Used can swap their tokens using the limit option in the following way:

Select the Limit option below the navbar.

Select or search for the token with which you want to swap the respective token.

Enter the amount of the token you want to swap.

Enter the price at which you want the swap to take place.

Scroll down to see your open orders. Your open orders are the orders which are still pending to be executed.

The order history option shows you all your past executed swaps using the limit option.

Once the limit price is met, the swap will be executed, and in your wallet, you will notice a decrease in the quantity of the tokens you swapped with and an increase in the quantity of the tokens you swapped for.

Provide Liquidity

Liquidity refers to lending your crypto to the exchange to increase the liquidity of the protocol. A protocol having high liquidity will be able to execute transactions (swap) very easily.

Users are required to provide liquidity in the form of two tokens. Both tokens are required to be in 50-50 weightage.

For example, if you are providing liquidity to the USDT-BUSD pair, you are required to supply the same quantity of BUSD and USDT.

For liquidity, users have rewarded LP tokens that they can stake and earn a good yield. Furthermore, users are also rewarded 0.17% of the fees generated by the respective trading pair.

Users can provide liquidity by following the steps below:

Since we are on the limit page, hover over the Trade option in the navbar.

Select the Liquidity option.

Select the pair to which you want to add liquidity.

Enter the amount of liquidity you want to provide.

Once the transaction is processed, you will notice the number of tokens you used to supply liquidity is decreased, and you will notice a new token in your wallet, the LP token.

Earn Features

Lets us now put our crypto to work. In the Earn section, we can earn yield for staking our tokens or LP tokens. This is similar to the concept of dividends in the stock market. Users are required to stake their tokens for a particular time (fixed) or flexible time and, depending on the yield rates, will earn the respective returns.

Farms (LP Staking)

The farm option is similar to the pool option. The only difference is that users are required to stake their LP tokens in the farm option. By staking their LP tokens, users can earn a good additional yield on their tokens.

Users can stake their LP tokens by following the steps below:

Hover over the Earn option in the navbar.

Select the Farm option.

On the Farm page, select the LP tokens you wish to stake.

Enter the amount of LP tokens you want to stake.

To calculate your ROI follow the steps below:

Enter your investment amount.

Select the duration of your lockup period.

Select the compounding period of your profits.

These are your results.

Whatever returns are shown are not guaranteed. The APR fluctuates from day to day. So you can expect your results to be higher or lower.

Pools (Token Staking)

The pool section is used to stake your tokens to earn yield. You can stake your tokens for a fixed amount of time or variable time. The yield rates for fixed staking are much higher than flexible staking.

Users can stake their tokens by following the steps below:

Select the Pool option from below the navbar.

On the Pool page, select the token you wish to stake.

Follow the steps below for staking via locked or flexible method.


In flexible staking, users can withdraw their staked tokens whenever they please.

Enter the number of tokens you want to stake.,


In locked staking, users stake their tokens for a fixed time. Users will not be able to withdraw their tokens until maturity (similar to FD). Locked staking provides higher rates.

Enter the number of tokens you want to stake.,

Select the duration you want to lock your tokens for.

Once the transaction has been confirmed, you will notice the number of tokens you allocated to staking has been decreased in your wallet. Once the duration of your staking is complete, the tokens, along with interest will be automatically transferred to your wallet.

To calculate your ROI follow the steps below:

Enter your investment amount.

Select the duration of your lockup period.

These are your results.

Whatever returns are shown are not guaranteed. The APR fluctuates from day to day. So you can expect your results to be higher or lower.


The prediction section is like gambling. Users can bet whether the price of $CAKE or $BNB will go up or down in 5 minutes. If the user predicts the right answer, they will receive their investment x the payout. If the user bets the wrong way, they will lose the entire investment.

For example, if a user invests $10, the payout is 2, and the user’s prediction is UP. If, after 5 minutes, the asset price goes up from the price the user entered, the user will earn 3*10 = $20.

Users can predict by following the steps below:

Hover over the Win option in the navbar.

Select the Prediction option.

On the Prediction page, you will notice the current and next predictions.

You can see the payouts for both predictions.

You can notice the time remaining for the next round to start.

Enter the amount you wish to invest.

If you win, you will receive the payout and your initial investment directly in your wallet.

Bonus Tip!

You must have noticed a slippage option when performing any transaction, such as swapping or adding liquidity.

Slippage refers to the price change between the expected order price and the executed order price.

For example, if your slippage is 10% and you wish to swap an asset, the asset may get swapped when the asset price is +10% or -10%.

To edit the slippage, follow the steps below:

Enter or select your preferred slippage percent.


Also, before visiting any dapp site, ensure that the URL is correct. You may do this by visiting the protocol’s social media profiles, such as Twitter, and using the links in the bio. There have been several scams as a result of phishing links.

Key Takeaways

Decentralized exchange gives control to the people.

A decentralized exchange allows users to trade assets without depositing them on the exchange.

PancakeSwap is a decentralized exchange built on the Binance Smart Chain (BSC) network. Transaction fees on the exchange are paid in the BNB coin.

PancakeSwap allows users to swap, stake, farm, and participate in contests and lottery and offers many more features.

Decentralized applications have risks associated with them. Please conduct intensive research before connecting to any decentralized platform.

Lastly, stay safe in crypto and remember, “not your key, not your crypto.”

The media shown in this article is not owned by Analytics Vidhya and is used at the Author’s discretion.


Update the detailed information about Gradient Boosting Algorithm: A Complete Guide For Beginners on the website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!