You are reading the article **Traversing The Trinity Of Statistical Inference Part 2: Confidence Intervals** updated in February 2024 on the website Achiashop.com. We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. *Suggested March 2024 Traversing The Trinity Of Statistical Inference Part 2: Confidence Intervals*

This article was published as a part of the Data Science Blogathon

This is the second article of the series on ‘Traversing the Trinity of Statistical Inference’. In this article, we’ll discuss the concept of confidence intervals. Before moving to the concept, we’ll take a slight detour and revise the ideas discussed in the previous article of this series.

We started with the example of a beverage company that is interested in knowing about the proportion of people who prefer tea over coffee. We stepped into the shoes of a statistician and began analyzing our experiment. The experiment involved surveying a group of 850 people (n = 850) and noting down their preference in the binary system of 0 (indicating preference of coffee) and 1 (indicating preference of tea). We defined a series of random variables X1, X2, …, Xn that follow a Bernoulli distribution to model our experiment.

We then explored certain basic properties of Xi and then introduced the idea of estimation. The sample-mean estimator (calculated as the average of our observations) was used for estimation and its properties were discussed in the light of the Law of Large Numbers (LLN) and the Central Limit Theorem (CLT). We concluded by evaluating the performance of our estimator through various metrics including bias, efficiency, quadratic risk, consistency, and asymptotic normality.

Now that we have familiarised ourselves with the fundamentals of estimation, we can take a step ahead and explore the second pillar of the realm of statistical science- Confidence Intervals. The purpose of confidence intervals, in layperson terms, is to create error bars around the estimated value of an unknown parameter. So, there are two aspects of a confidence interval:

Confidence: It indicates the level of surety we wish to attain

Interval: It indicates a range of values that our estimator can take.

“Is there a range of possible outcomes that the estimate can take depending upon how confident we want to be? What is that range?” Let’s begin!

Topics:A) Notation & Basic Properties of the Gaussian distribution

B) Asymptotic Normality of the Sample Mean Estimator

C) A General Notion for Confidence Intervals

D) Deriving the Asymptotic Confidence Intervals for the Sample Mean Estimator

E) Plug-in Method and the Frequentist Interpretation

F) Conservative Bound Method

G) Quadratic Method

A) Notations & Basic Properties of the Gaussian distribution1) Linear transformation of a Gaussian distribution: Let X follow a Gaussian distribution with mean µ and variance σ2. Suppose we are given a random variable Y that is a linear transformation of X such that:

form of X will follow a standard normal distribution i.e., a Gaussian with mean 0 and variance 1. Mathematically,

3) Cumulative Distribution Function (CDF) of a Gaussian Distribution: If Z follows a standard Gaussian distribution, then the following notation is used the CDF (the probability that Z is lesser than or equal to some number t) of Z:

Using this notation (and the property of standardization), we can obtain the CDF of any random variable X that follows a Gaussian distribution with mean µ and variance σ2:

Graphically, Φ(t) denotes the area below the standard Gaussian curve between -infinity and t. That is,

(Image by Author)

4) Quantiles of a Gaussian Distribution and its properties: If,

Where Φ-1 is the inverse of the CDF of the Gaussian. Essentially, Φ-1(x) gives a value t such that the area below the standard Gaussian curve between -infinity and t is equal to x. We define the quantiles of the gaussian as follows:

Graphically,

(Image by Author)

How are all these properties going to be useful to us? These properties will be relevant in the context of the asymptotic normality of the sample mean estimator, which has been discussed in the next section.

B) Asymptotic Normality of The Sample Mean EstimatorThe application of the Central Limit Theorem (CLT) on the sample mean estimator gave us the following result:

It shows that the standardized version of the sample mean estimator converges (“in distribution”) to the standard Gaussian distribution. This property of estimators was called asymptotic normality. In fact, by using the properties of normal distribution, we can also conclude that the same mean estimator itself follows a normal distribution:

Property used: If X follows a Gaussian distribution with mean µ and variance σ2, then aX + b follows a normal distribution with mean aµ + b, and variance a2σ2. Let’s talk about this asymptotic normality more. In general, an estimator (or equivalently an estimate) θ-hat for parameter θ is said to exhibit asymptotic normality if:

Where σθ2 is referred to as the asymptotic variance of the estimator θ-hat.

Although not of much relevance, we can use the above definition (and properties of gaussian distribution), obtain the asymptotic variance of our sample mean estimator as follows:

Thus the asymptotic variance of p-hat is p(1 – p). You might be wondering how is all this related to the idea of confidence intervals? Asymptotic normality allows us to garner information about the distribution of the sample mean estimator. Since we know that the above function of the sample mean estimator follows a gaussian distribution for large sample sizes, we can calculate the probability that the following function (of the sample mean estimator) lies between a certain interval A. Mathematically, we can say that (for large sample size):

Generally, it’s easier to play around with the following form:

(Equation 1)

This is like the core equation of this entire article. All new concepts will be built upon this equation.

C) A General Notion for Confidence IntervalsFirst, we’ll talk about a general notion (mathematically of course) of confidence intervals. We’ll then apply this notion to our example, mix it up with asymptotic normality, and develop something spicy.

Let Y1, Y2, …, Yn be some independent and identically distributed random variables with the following statistical model:

We introduce a new variable called α such that α ∈ (0, 1). Our goal is to create a confidence interval 𝕀 for the true parameter θ such that the probability that θ lies in 𝕀 is more than or equal to 1 – α. Mathematically,

that p belongs to 𝕀 is more than or equal to 90% (or 0.9):

In other words, we fix our confidence interval at α = 0.10 and compute the confidence interval accordingly. Now, it won’t be easy for us to create confidence intervals for finite sample sizes. This is because the information about the distribution of the sample mean estimator is given by the Central Limit Theorem, which assumes large sample sizes (n approaches infinity). So instead of finite confidence intervals, we introduce asymptotic confidence intervals, which are defined as follows:

The confidence intervals that we are describing for our example are two-sided because we are not interested in an upper or lower boundary that limits the value of our true parameter. On the other hand, in a one-sided confidence interval, our goal is to obtain a tight upper or lower bound for the true parameter. For instance, if we are interested in determining the mean concentration of toxic wastes in a water body, we don’t care about how low the mean could be. Our focus would be to determine how large the mean could be i.e., finding a tight upper bound for the mean concentration of the toxic wastes. This article shall restrict our discussion to two-sided confidence intervals as they are more relevant to our example. Why two-sided confidence interval for our example? Because we are not interested in overestimating or underestimating the true parameter p. Our focus is simply to find an interval that contains p with a certain confidence/level.

A two-sided confidence interval is generally symmetric about the estimator that we are using to determine the true parameter. In most cases, if θ-hat is our estimator for θ, then the two-sided confidence interval for θ (particular to our estimator) is represented as:

will find later!). In our example, the confidence interval is described as follows:

(Equation 2)

(Equation 3)

And for the rest of the article, our goal will be to determine ‘a’.

Just a few concluding questions:

𝕀 is going to depend upon the level 1 – α, the sample size n, and the value of the estimator θ-hat. So different estimators yield different confidence intervals.

The answer is no. This must seem obvious: if 𝕀 was dependent on θ, then we wouldn’t have been able to calculate it. Hence, consider this as a rule of thumb: “Confidence intervals must be independent of the true parameter we are trying to estimate”.

(Think….) The answer will be discussed later.

D) Deriving the Asymptotic Confidence Intervals for the Sample Mean EstimatorNow that we are well versed with the general notion and associated terminologies, we can start constructing the confidence interval for our example:

We let A be the following interval:

Where a is some constant. Why this interval only? Because the above interval gives equation 1 a special form:

(Equation 4)

Does the LHS seem familiar? Remember equation 3?

Now, we shall use the properties of the gaussian distribution to compute the LHS of the above equation:

Using property 2 of Gaussian distributions i.e., standardizing the distribution to get the standard gaussian Z, we obtain the following equation:

Recall that by the property of symmetry of the standard gaussian, we have,

Recall the definition of quantiles of the gaussian distribution: q(α/2) denotes the αth/2 quantile. Thus, we obtain:

So, finally, we’ve obtained an expression for a! Let’s substitute this expression for ‘a’ in equation 2:

So, are we done? Not yet. Recall that the confidence interval cannot depend upon the true parameter p, which is not seen in the above expression. So, now we have another problem: remove the dependency of 𝕀 on p. The question is how? Well, there are 3 ways to resolve this problem.

E) Plug-in Method and the Frequentist InterpretationThe first method, which is possibly the simplest involves replacing the true parameter p in the expression for 𝕀 with the value of the estimator p-hat i.e., replace p with the sample mean. This gives us the following results:

Yes, we’ve obtained a confidence interval! We’ll now plug in some real values and calculate 𝕀 for our example. Recall that from our survey we found out that 544 people prefer tea over coffee, while 306 people prefer coffee over tea.

So, now we compute:

The 90% plug-in confidence interval for the proportion of people that prefer tea over coffee.

The 95% plug-in confidence interval for the proportion of people that prefer tea over coffee.

Let’s solve these problems:

1) 90% plug-in confidence interval implies 1 – α = 0.90, giving us α=0.10. Using any statistical software, we can obtain that:

we obtain:

2) 95% plug-in confidence interval implies 1 – α = 0.95, giving us α=0.05. Using any statistical software, we can obtain that:

Substituting all these results in the expression for 𝕀plug-in, we obtain:

Observation: 𝕀plug-in for 95% confidence is larger than 𝕀plug-in for 90% confidence. This makes sense. The more we want to be confident about our parameter lying in an interval, the more is going to be the width of the interval. Before proceeding further with our discussion of the other methods, I find it very important to put forward a small question (that has a humongous answer):

What is the probability that the true p belongs to in the interval [0.6077, 0.6723]?

Answer: It’s not 0.95! That might seem confusing. We wanted to create an interval such that the probability that the interval contained p was at least 0.95. We did so and now we are saying that the probability is not 0.95! What’s the mystery?. Remember, I had asked you to find if confidence intervals are a random or deterministic quantity. The answer to that question was random since confidence intervals were dependent upon the estimator, which itself is a random quantity. In other words, the confidence interval for the parameter p depends upon the random sample we’ve chosen. If we had surveyed some other portion of Mumbai’s population, then the sample mean could have taken a different value, say 0.62 giving us a different confidence interval. Since 𝕀 was random, we could make statements such as:

But, once we plug in the true values, then the random 𝕀 assumes a deterministic value i.e., [0.6077, 0.6723]. It is no longer random. The probability that the true p belongs to this interval can be only 1 (if true p is some number that lies between 0.6077 and 0.6723) or 0 (if true p is some number that does not lie between 0.6077 and 0.6723). Mathematically,

There’s no other probability that’s possible. We are calculating the probability that one deterministic quantity lies in another deterministic quantity. Where’s the randomness? Probabilistic statements require randomness, and in the absence of randomness, the statements do not make much sense.

It’s like asking what’s the probability that 2 is between 1 and 3? Of course, 2 is between 1 and 3, so the probability is 1; it’s a sure event. What’s the probability that 2 is between 3 and 4? Of course, 2 is not between 3 and 4, so the probability is 0; it’s an impossible event. Well, you might think we are back to square 1 since these intervals don’t make sense. What’s the use of that math we did? That’s because we still haven’t understood the interpretation of confidence intervals. (The mystery increases…)

So, how do we interpret 𝕀? Here, we shall discuss the frequentist interpretation of confidence intervals. Suppose we observed different samples i.e., we went across Mumbai surveying several groups of 850 people. For each sample, we’ll construct a 95% confidence interval. Then the true p will lie in at least 95% of the confidence intervals that we created. In other words, 95% is the minimum proportion of confidence intervals that contain the true p. Now that’s randomness. And that’s why we can make probabilistic statements here.

F) Conservative Bound MethodIn this method, we replace the occurrence of p with the maximum value that the function of p can take. This method may not apply in many situations, but in our case, it works well. The expression we obtained for 𝕀 was:

Here we replace the above function with its maximum value. How does that work? Remember that the probability that p belongs to 𝕀 must be at least 1 – α. So, if we substitute the maximum value of the above function, we’ll obtain the maximum width of the confidence interval, which does not really affect our probability of ‘at least 1 – α’. That’s why is called the conservative bound method. In fact, conservative confidence intervals have a higher probability of containing the true p because they are wider. So, we are interested in obtaining:

The maximum value can easily be found by using calculus. But instead, I’ll use the graphical approach as it’s more intuitive. The graph for sqrt(p*(1 – p)) is shown below:

[Image by Author (made from Desmos Graphing Calculator)]

It can be seen that sqrt(p*(1 – p)) is maximised for p = 0.5, and the maximum value of the function is 0.5. Substituting all this in the expression for 𝕀 we obtain,

Thus, we have obtained the conservative confidence interval for our example. We shall now solve the following problems:

Compute:

The 90% conservative confidence interval for the proportion of people that prefer tea over coffee.

The 95% conservative confidence interval for the proportion of people that prefer tea over coffee.

Let’s solve these problems:

that:

we obtain

that:

we obtain:

Notice that the conservative confidence intervals are wider than the plug-in confidence intervals.

G) Quadratic MethodWe shall now discuss the final method, which is possibly the hardest of all three. Although it generally gives good results (in terms of narrower confidence intervals), the process of calculating the intervals is much longer. The idea is as follows:

1) We assume that the true p belongs to 𝕀:

This gives us a system of two inequalities:

The above inequalities can also be represented as:

2) We square the above expression, which gives us:

We replace the ‘≤’ sign with ‘=’ sign and open the brackets to get the following quadratic equation:

3) We solve the above quadratic equation to get two solutions which shall be the lower and upper limits of the ‘solved’ confidence interval. Using the quadratic equation,

Yes, the solved confidence interval is that long-expression, which I cannot even fit in a single line. As I said, the interval is narrower, but the process is longer. We shall now solve the following problems:

Compute:

The 90% solved confidence interval for the proportion of people that prefer tea over coffee.

The 95% solved confidence interval for the proportion of people that prefer tea over coffee.

Let’s solve these problems:

1) 90% solved confidence interval implies 1 – α = 0.90, giving us α=0.10. Using any statistical software, we can obtain that:

a simple quadratic and solve it using any quadratic equation calculator:

Solving, the above equation, we obtain:

2) 95% solved confidence interval implies 1 – α = 0.95, giving us α=0.05. Using any statistical software, we can obtain that:

Instead of using that long formula, we’ll obtain a simple quadratic and solve it using any quadratic equation calculator:

Solving, the above equation, we obtain:

Notice that the solved confidence intervals are narrower than the plug-in confidence intervals, but with a difference of only about 0.0001. So, it’s better but the magnitude of complexity is much more than the improvement attained.

This concludes our discussion on confidence intervals. The next and the final article of this series will describe the process of hypothesis testing. Unlike estimation and confidence intervals that gave results in numerical format, hypothesis testing will produce results in a yes/no format. It’s going to be a very exciting and challenging journey ahead!

ConclusionIn this article, we continued with our statistical project and understood the essence of confidence intervals. It’s important to note that we took a very basic and simplified example of confidence intervals. In the real world, the field of confidence intervals is vast. Various probability distributions require a mix of several techniques to create confidence intervals. The purpose of this article was to not only see confidence intervals as a mix of theory and math but also to make us feel the idea. I hope you enjoyed reading this article!

If you liked my article and want to read more of them, visit this link. The other articles of this series will be found on the same link.

Note: All images have been made by the author.

About the AuthorImage by Author

I am currently a high school student, who is deeply interested in Statistics, Data Science, Economics, and Machine Learning. I have written two data science research papers. You can find them here.

The media shown in this article are not owned by Analytics Vidhya and are used at the Author’s discretion.

Related

You're reading __Traversing The Trinity Of Statistical Inference Part 2: Confidence Intervals__

## Securing Apache On Ubuntu – Part 2

My previous article focused on basic security tips and tricks to secure Apache web server in Ubuntu.

You can do this by editing the “apache2.conf” file.

sudo

nano

/

etc/

apache2/

apache2.confAdd the following line inside Directory /var/www/html/:

Header always append X-Frame-Options SAMEORIGINSave the file and restart Apache.

sudo

/

etc/

init.d/

apache2 restartNow, try to open a web browser to access your web server. Check HTTP response headers in firebug; you should see X-Frame-Options as shown in the below image.

Disable EtagEtags, also known as “Entity Tags,” are a vulnerability in Apache. They allow remote users to obtain sensitive information like inode number, child process IDs and multipart MIME boundary using the Etag header. It is recommended to disable Etag.

You can do this by editing the “apache2.conf” file.

sudo

nano

/

etc/

apache2/

apache2.confAdd the following line inside Directory /var/www/html/:

FileETag NoneSave the file and restart Apache.

Now, try to open a web browser to access your web server. Check HTTP response headers in firebug; you should not see Etag at all.

Disable Old ProtocolYou can disable it using the “mod_rewrite” rule by only allowing HTTP 1.1 protocol.

For this, edit the “apache2.conf” file.

sudo

nano

/

etc/

apache2/

apache2.confAdd the following line inside Directory /var/www/html/:

RewriteEngine On RewriteCond%

{

THE_REQUEST}

!

HTTP/

1

.1$ RewriteRule .*

-[

F]

Save the file and restart Apache.

HTTP Request MethodsIn Ubuntu, HTTP 1.1 protocol supports many request methods like “OPTIONS, GET, HEAD, POST, PUT, DELETE, TRACE, CONNECT” which may not be required. It is recommended to enable only HEAD, POST and GET request methods.

To fix this, edit the Apache configuration file.

sudo

nano

/

etc/

apache2/

apache2.confAdd the following line inside Directory /var/www/html/:

deny from allSave the file and restart Apache.

Secure Apache from an XSS AttackXSS (also known as Cross-site Scripting) is one of the most common application-layer vulnerabilities. It allows an attacker to execute code on the target web server from a user’s web browser. Attackers can attack on XSS vulnerable web server by using a browser side scripting (JavaScript), so it is recommended to enable XSS protection on Apache.

You can do this by editing the Apache configuration file.

sudo

nano

/

etc/

apache2/

apache2.confAdd the following line inside Directory /var/www/html/:

Headerset

X-XSS-Protection"1; mode=block"

Save the file and restart Apache.

Now, try to open a web browser to access your web server. Check HTTP response headers in firebug; you should see X-XSS-Protection Options as shown in the below image.

Protect Cookies with HTTPOnly FlagTo fix this, edit the Apache configuration file.

sudo

nano

/

etc/

apache2/

apache2.confAdd the following line inside Directory /var/www/html/:

Header edit Set-Cookie ^(

.*

)

$$1

;HttpOnly;SecureSave the file and restart Apache.

ConclusionHitesh Jethva

Over 5 years of experience as IT system administrator for IT company in India. My skills include a deep knowledge of Rehat/Centos, Ubuntu nginx and Apache, Mysql, Subversion, Linux, Ubuntu, web hosting, web server, squied proxy, NFS, FTP, DNS, Samba, ldap, Openvpn, Haproxy, Amazon web services, WHMCS, Openstack Cloud, Postfix Mail Server, Security etc.

Subscribe to our newsletter!

Our latest tutorials delivered straight to your inbox

Sign up for all newsletters.

By signing up, you agree to our Privacy Policy and European users agree to the data transfer policy. We will not share your data and you can unsubscribe at any time.

## Evernote Open Source Alternatives Part 2: Standard Notes

Evernote open source alternatives Part 2: Standard Notes [UPDATED]

Some people fear that the Evernote ship has too many holes and too few crew members to stay afloat. Others have already started their exodus, looking for comparable services. While there’s no shortage of that, there’s something to be said for open source solutions that have the higher chance of surviving bankruptcy, hacks, and governments. Last time we took a look at Turtl, which turned out to be more of a Google Keep Notes alternative than an Evernote replacement. Now we try Standard Notes for a spin which places an even heavier emphasis on privacy and longevity.

If you have been burned by services that suddenly vanish without a trace, you might want to hear Standard Note’s spiel which comes in two parts. The first is end-to-end encryption, which uses your password not just to unlock your account but as an encryption key for each and every note you make on the app. Its second message is one of survivability, at least of your notes. Should Standard Notes go under, you will still have your notes thanks to backups as well as an offline decryption program. And, of course, it’s all open source, including the server backend it uses.

One of Standard Note’s key strengths when it comes to performance is speed. That’s partly because its basic free account is pretty barebones. Encryption and unlimited sync comes for free, as well as offlline access on any number of devices. Standard Notes has apps available for all the standard platforms and even has a Web version for everything else. The downside to that free account is that Standard Notes is pretty much a plain text editor. For some, that might be enough, for others, the $9.99 monthly fee is well worth the investment.

UPDATE: Just a quick update and clarification on Standard Notes’ pricing tiers. That $9.99 monthly figure might seem disheartening, but that is simply the full price if you want to only be billed every month. If, however, you have no problems being billed yearly or even every five years, then there are cheaper options available too.

• $4.17 per month is what you’d be effectively paying for if you opted to be billed annually. The price you’ll be charged for each year will be $49.99, a total proce saving of 58%.

• $2.48 per month is the cheapest option but one where the subscription is good for five years. The total thst you’ll be paying when you subscribe is $149, which is a 75% savings in total.

The Extended account brings out the full power of Standard Notes. While the basic free account is pretty much a highly-secure, cloud-synced plain text editor, Extension available only on through the paid subscription can give Evernote a run for its money. Use markdown to add formatting and images while retaining the note’s plain text format, integrate with services like GitHub and back up to other cloud services of your choice. You can even change the way the app looks with themes, something Evernote still hasn’t been able to provide.

That said, the default appearance of the Standard Notes desktop app, especially for free users, is pretty sparse. It follows the same three-column layout that Evernote uses, with the notebooks at the left most column, followed by the list of notes, and finally the contents of the notes. Unless you pony up some cash, that’s pretty much all you can have.

For some who need a quick and secure, no-frills note-taking service, Standard Notes’ performance and presence on all platforms, including the web, is more than enough. You can definitely extend its power, but the ability to even do basic formatted text is sadly hidden behind a subscription and can be a deal breaker, especially considering how much it’s asking for each month. Then again, that fee does bring more than pretty text and images but more peace of mind via backups and version history and Standard Notes’ free tier still offers more freedom and security than Evernote ever did.

## Zelda Breath Of The Wild 2 Release Date – Breath Of The Wild 2

Zelda Breath of the wild 2 Release Date – Breath of the Wild 2

The Legend of Zelda Breath of the Wild 2 Release Date and Speculations

The latest from the Zelda BOTW 2 release is the appearance of several retailer placeholder pages on Amazon US, Amazon UK, and Argos in the UK.

One of the most anticipated Switch games for a very long time. Players are foaming at the mouth in excitement to play Zelda Breath of the Wild 2. It’s not often that Legend of Zelda games get direct sequels. So when players found out that one of their most favourite Zelda games will be getting a Switch sequel. Well, needless to say there was a wave of joy across the Zelda fandom.

The Legend of Zelda Breath of the Wild was critically acclaimed, in fact many players bought the Switch solely to play the game. Thus, it’s no surprise just how excited players and critics are to get their hands on Zelda BOTW 2.

The Legend of Zelda Breath of the Wild 2 doesn’t have an official release date just yet. But the recent Nintendo Direct at least narrowed it down somewhat.

According to various rumours and expectations, people are starting to think that Breath of the Wild 2 will see a Fall 2023 release. At the very least this has narrowed down the window of release from 2023 to a more precise time.

Zelda BOTW 2 Rumours and Theories

We’re quite certain that the official title won’t be Breath of the Wild 2. The Zelda series tends to have unique titles, not ‘direct sequel’ names (Ocarina of Time and Majora’s Mask are a prime example). Although, it’s worth noting that the title hasn’t been released because, Nintendo Treehouse’s Bill Trinen, said that it contains clues and possible spoilers to the story. Exciting though, right? But in the meantime, we’ll just call it BOTW 2.

However, when it comes to story, who knows what’s going on! There’s been some popular theories circulating the web

Time Travel. The music at the start of the 2023 BotW 2 trailer sounds reversed, which led some to believe that this is a clue; perhaps the BotW 2 story starts by going back in time. Not to mention, the game series haven’t been averse to time travel plots.

The two Links. This theory is strengthened by the 2023 BotW 2 trailer, as we see Link in the ‘normal’ Hyrule wearing his traditional clothing and hairstyle, while Link in the floating island realm is wearing a new set of clothes and loose hair. Did he just change his look, or is floating island Link actually Link from the past?

Secret Zelda. Again, floating island Link could not be Link at all. The loose hair is not too long, and we already saw Zelda with shorter loose hair in the 2023 trailer. It’s not unlikely that BotW 2 will feature both Link and Zelda as playable characters.

Whatever happens, we’re certain it’ll be absolutely phenomenal. We’ll update you all as we go along. Can’t wait!

## A Beginner’s Guide Bayesian Inference

This article was published as a part of the Data Science Blogathon.

Introduction

Classical

Frequentist

Bayesian

Let’s understand the differences among these 3 approaches with the help of a simple example.

Suppose we’re rolling a fair six-sided die and we want to ask what is the probability that the die shows a four? Under the Classical framework, all the possible outcomes are equally likely i.e., they have equal probabilities or chances. Hence, answering the above question, there are six possible outcomes and they are all equally likely. So, the probability of a four on a fair six-sided die is just 1/6. This Classical approach works well when we have well-defined equally likely outcomes. But when things get a little subjective then it may become a little complex.

On the other hand, Frequentist definition requires us to have a hypothetical infinite sequence of a particular event and then to look at the relevant frequency in that hypothetical infinite sequence. In the case of rolling a fair six-sided die, if we roll it for the infinite number of times then 1/6th of the time, we will get a four and hence, the probability of rolling four in a six-sided die will be 1/6 under frequentist definition as well.

Now if we proceed a little further and ask if our die is fair or not. Under frequentist paradigm, the probability is either zero when it’s not a fair die and one if it is a fair die because under frequentist approach everything is measured from a physical perspective and hence, the die can be either fair or not. We cannot assign a probability to the fairness of the die. Frequentists are very objective in how they define probabilities but their approach cannot give intuitive answers for some of the deeper subjective issues.

Bayesian perspective allows us to incorporate personal belief/opinion into the decision-making process. It takes into account what we already know about a particular problem even before any empirical evidence. Here we also have to acknowledge the fact my personal belief about a certain event may be different than others and hence, the outcome that we will get using the Bayesian approach may also be different.

For example, I may say that there is a 90% probability that it will rain tomorrow whereas my friend may say I think there is a 60% chance that it will rain tomorrow. So inherently Bayesian perspective is a subjective approach to probability, but it gives more intuitive results in a mathematically rigorous framework than the Frequentist approach. Let’s discuss this in detail in the following sections.

What is Bayes’ Theorem?Simplistically, Bayes’ theorem can be expressed through the following mathematical equation

Now let’s focus on the 3 components of the Bayes’ theorem

• Prior

• Likelihood

• Posterior

• Prior Distribution – This is the key factor in Bayesian inference which allows us to incorporate our personal beliefs or own judgements into the decision-making process through a mathematical representation. Mathematically speaking, to express our beliefs about an unknown parameter θ we choose a distribution function called the prior distribution. This distribution is chosen before we see any data or run any experiment.

How do we choose a prior? Theoretically, we define a cumulative distribution function for the unknown parameter θ. In basic context, events with the prior probability of zero will have the posterior probability of zero and events with the prior probability of one, will have the posterior probability of one. Hence, a good Bayesian framework will not assign a point estimate like 0 or 1 to any event that has already occurred or already known not to occur. A very handy widely used technique of choosing priors is using a family of distribution functions that is sufficiently flexible such that a member of the family will represent our beliefs. Now let’s understand this concept a little better.

i. Conjugate Priors – Conjugacy occurs when the final posterior distribution belongs to the family of similar probability density functions as the prior belief but with new parameter values which have been updated to reflect new evidence/ information. Examples Beta-Binomial, Gamma -Poisson or Normal-Normal.

ii. Non-conjugate Priors –Now, it is also quite possible that the personal belief cannot be expressed in terms of a suitable conjugate prior and for those cases simulation tools are applied to approximate the posterior distribution. An example can be Gibbs sampler.

iii. Un-informative prior – Another approach is to minimize the amount of information that goes into the prior function to reduce the bias. This is an attempt to have the data have maximum influence on the posterior. These priors are known as uninformative Priors but for these cases, the results might be pretty similar to the frequentist approach.

• Likelihood – Suppose θ is the unknown parameter that we are trying to estimate. Let’s represent fairness of a coin with θ. Now to check the fairness, we are flipping a coin infinitely and each time it is either appearing as ‘head’ or ‘tail’ and we are assigning a 1 or 0 value accordingly. This is known as the Bernoulli Trials. Probability of all the outcomes or ‘X’s taking some value of x given a value of theta. We’re viewing each of these outcomes as independent and hence, we can write this in product notation. This is the probability of observing the actual data that we collected (head or tail), conditioned on a value of the parameter theta (fairness of coin) and can be expressed as follows-

This is the concept of likelihood which is the density function thought of as a function of theta. To maximize the likelihood i.e., to make the event most likely to occur for the data we have, we will choose the theta that will give us the largest value of the likelihood. This is referred to as the maximum likelihood estimate or MLE. Additionally, a quick reminder is that the generalization of the Bernoulli when we have N repeated and independent trials is a binomial. We will see the application later in the article.

Mechanism of Bayesian Inference:The Bayesian approach treats probability as a degree of beliefs about certain event given the available evidence. In Bayesian Learning, Theta is assumed to be a random variable. Let’s understand the Bayesian inference mechanism a little better with an example.

Inference example using Frequentist vs Bayesian approach: Suppose my friend challenged me to take part in a bet where I need to predict if a particular coin is fair or not. She told me “Well; this coin turned up ‘Head’ 70% of the time when I flipped it several times. Now I am giving you a chance to flip the coin 5 times and then you have to place your bet.” Now I flipped the coin 5 times and Head came up twice and tail came up thrice. At first, I thought like a frequentist.

So, θ is an unknown parameter which is a representation of fairness of the coin and can be defined as

θ = {fair, loaded}

Additionally, I assumed that the outcome variable X (whether head or tail) follows Binomial distribution with the following functional representation

Now in our case n=5.

Now my likelihood function will be

Now, I saw that head came up twice, so my X =2.

= 0.13 if θ =loaded

Therefore, using the frequentist approach I can conclude that maximum likelihood i.e., MLE (theta hat) = fair.

Now comes the tricky part. If the question comes how sure am I about my prediction? I will not be able to answer that question perfectly or correctly as in a frequentist world, a coin is a physical object and hence, my probability can be either 0 or 1 i.e., the coin is either fair or not.

Therefore, my prior P(loaded)=0.9. I can now update my prior belief with data and get the posterior probability using Bayes’ Theorem.

My numerator calculation will be as follows-

The denominator is a constant and can be calculated as the expression below. Please note that we are here basically summing up the expression over all possible values of θ which is only 2 in this case i.e., fair or loaded.

Hence, after replacing X with 2 we can calculate the Bayesian probability of the coin being loaded or fair. Do it yourself and let me know your answer! However, you will realize that this conclusion contains more information to make a bet than the frequentist approach.

Application of Bayesian Inference in financial risk modeling:Bayesian inference has found its application in various widely used algorithms e.g., regression, Random Forest, neural networks, etc. Apart from that, it also gained popularity in several Bank’s Operational Risk Modelling. Bank’s operation loss data typically shows some loss events with low frequency but high severity. For these typical low-frequency cases, Bayesian inference turns out to be useful as it does not require a lot of data.

Earlier, Frequentist methods were used for operational risk models but due to its inability to infer about the parameter uncertainty, Bayesian inference was considered to be more informative as it has the capacity of combining expert opinion with actual data to derive the posterior distributions of the severity and frequency distribution parameters. Generally, for this type of statistical modeling, the bank’s internal loss data is divided into several buckets and the frequencies of each bucket loss are determined by expert judgment and then fitted into probability distributions.

Hello! I am Ananya. I have a degree in Economics and I have been working as a financial risk analyst for the last 5 years. I am also a voracious reader of Data Science Blogs just as you are. This is my first article for Analytics Vidhya. Hope you found this article useful.

The media shown in this article are not owned by Analytics Vidhya and is used at the Author’s discretion.

Related

## Btc Proxy Launches Bitcoin Farming As Part Of A Major Expansion Of The Protocol

Leading DeFi protocol BTC PROXY today unveiled the Bitcoin Yield Farm, which goes live on October 14th, 2023, and fulfills a vision of allowing the community to earn DeFi yields in BTC.

BTCProxy’s farming innovation is the first in the sector rewarding BTC for staking the protocol’s token. This is a major leap forward for BTC DeFi, and no other project offers yield farming of Bitcoin via a permissionless, decentralized protocol in this way.

The protocol was designed to favor BTC holders, as they are offered insured custody of their coins while participating in the yield program through partnerships with leading custodians such as Binance Custody, Gemini Trust, Hex Trust, and Fireblocks.

Users deposit BTC to custody and mint a BTCpx equivalent on Polygon, which can then be used to purchase a Bond. The benefit to acquiring PRXY via a Bond versus purchasing a PRXY token from an open exchange is that there is no slippage and offers an incentivizing yield. The 5-day ROI on the BTCpx bond will be 8% for this promotional period which has a 5-day maturity.

PRXY is then staked in the farming contract, in one of three tiers starting at a minimum of 1000 PRXY. BTCpx earned from this stake can be claimed each block, where it can be re-bonded for compound returns or redeemed for BTC.

Until today, the specific tiers and rates of return have been a closely guarded secret within DeFi. However, today’s announcement has confirmed attractive APY rates of up to 7% paid in BTC, depending on the level of PRXY staked.

Bonds will open on October 7th, and bonders will have first access to the farm through a Claim and Farm function at maturity. This provides a strong use case for the PRXY token and is the next step in building out the protocol’s ambitious vision.

The Bitcoin Farm is designed to be a sustainable protocol that only pays out BTCpx that has been earned in fees from Minting and Redeeming, from the sale of Bonds, and the soon to be launched: Redux — Interest-Free Borrowing with BTC Collateralization.

With over 30 institutions onboarded to the protocol already, including majors like chúng tôi the possibilities for #RealYield keeps growing. Binance, recently on-boarded as a custodian, has also shown its faith in the project and it is hoped the partnership doesn’t stop at just custody. It’s worth noting that wrapping of BTC onto BNB Chain and thus the immersion of BTC into BNB chain DeFi is lacking.

Recent events have given pause to many’s thinking that CeFi is in any way superior to DeFi. The transparency of DeFi allows protocol participants to see exactly where the #RealYield is being earnt.

Today’s announcement sets out two new safe and secure routes that BTC holders can use to generate returns. It positions BTC PROXY as the safest, fastest, and most efficient protocol to unlock the value of Bitcoin. The developing protocol focuses on value accrual to the $PRXY token and coming protocol developments look to continue that trend via #RealYield.

Donnie Kim, BTC Proxy CEO said:

“At BTC PROXY, we are focused on making Bitcoin work for holders, work for DeFi, and work for the wider community. Bitcoin is great at storing value but unlocking and using that value is not easy. Today’s announcement of Bonding and Farming on our protocol is a major step forward for BTC holders looking for reliable and assured ways to put their BTC to work.

As a protocol, we have always been clear that our solutions should be trusted, safe, and efficient. Centralized services have well-documented risks. These have led to recent well-publicized failures that impact on everyone in the sector.

BTC PROXY believes that BTC holders should not have to ‘cross their fingers’ when investing their assets. Our insured custody model guarantees the safety of investments, making our new Bonding and Farming functions the perfect choice for individuals and institutions alike.

We will continue to innovate to find ways to unlock Bitcoin’s value and play our role in creating a trusted, safe and efficient DeFi ecosystem.”

About BTC PROXYBTC Proxy is a multi-institutional protocol for the decentralized tokenization of Bitcoin on ERC20 and MRC20 formats utilizing the Proxy Relay. This gives Bitcoin holders a decentralized bridge to stake their Bitcoin into custody and transfer that value onto Ethereum or Polygon chains without the need for centralized exchanges and systems that exponentially increase the counterparty risk of theft or loss.

BTC Proxy also allows for the transfer of value without price slippage and is independent of liquidity which is a factor that affects exchange prices on Centralized exchanges and Decentralized exchanges.

Fireblocks and Gemini are custodians of all investments. Hex Trust maintains commercial criminal insurance to safe keep client assets whilst Aon provides wider insurance coverage via a select group of insurers.

Media Enquiries: [email protected]

Web: BTCproxy.io

Twitter: @BTC_proxy

Update the detailed information about **Traversing The Trinity Of Statistical Inference Part 2: Confidence Intervals** on the Achiashop.com website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!