Difference between revisions of "Bridging Probability Theory and Statistical Estimation"

Latest revision as of 06:16, 16 July 2025

Bridging Probability Theory and Statistical Estimation is summary on the exercise about precision of estimating the mean value of a quantity from independent measurements.

The result of every measurement is assumed to be a real number \(X\), independently drawn from a normal distribution centered at an unknown "true value" \(X_0\) with unknown variance \(S^2\).

This note presents a mathematically grounded interpretation of the uncertainty in an estimate \(\bar{X}\) of an unknown true value \(X_0\), using the probability density function.

The approach builds a bridge between Kolmogorov-style probability theory and practical statistical inference — without requiring commitment to Bayesian or frequentist ideology.

The goal of this article is to propose the precise terminology and eliminate ambiguous terms that often lead to confusion.

Introduction

Editors and practitioners often face vague questions like:

“How precise is your estimate?”
“What’s the accuracy of your estimate?”
“What’s the error of your estimate?”
“How confident are you in your estimate?”
“What’s the possible deviation of your estimate from the true value?”

These questions must be reformulated as requests for clearly defined statistical quantities.

This task is challenging due to widespread misunderstandings and long-standing confusions. They remain even in century 21. Some of them are mentioned in publications ^[1]^[2]^[3]^[4]^[5].

Here, we do not delve into the popular confusions, but provide the simple formulas that, we hope, allow to avoid the confusions and the misinterpretations.

Model Assumptions

There is an unknown true value \(X_0 \in \mathbb{R}\) to be estimated.

The Expert performs \(N\) independent measurements \(X_1, X_2, \dots, X_N\), modeled as \[ X_i \sim \mathcal{N}(X_0, S^2) \] with both \(X_0\) and \(S\) unknown.

The Expert computes the following quantities:

Sample mean (point estimate): \[ \bar{X} = \frac{1}{N} \sum_{i=1}^N X_i \]

Sample standard deviation: \[ s = \sqrt{ \frac{1}{N - 1} \sum_{i=1}^N (X_i - \bar{X})^2 } \]

Naive standard error: \[ c_N = \frac{s}{\sqrt{N}} \]

Anticipating the main result, it can be mentioned that at small \(N\), the Naive standard error \(c_N\) underestimates the uncertainty of estimate \(\tilde X\) of the "true mean value" \(X_0\).

Under the model assumptions, the unbiased in expectation for the uncertainty is

\[ \sigma_N = \sqrt{\frac{N-1}{N-3}\ }\cdot c_N \]

It is considered below.

Likelihood-Like Density

Let

\[ f_N(x) = \frac{1}{c_N} \cdot \mathrm{Student}_{N-1}\!\left( \frac{x - \bar{X}}{c_N} \right) \]

This is the probability density function of a Student’s t-distribution with \(N\!-\!1\) degrees of freedom, centered at the sample mean \(\bar{X}\) and scaled by the standard error \(c_N\).

Function \(f_N\) acts as a confidence distribution density (or data-driven predictive density) for value \(X_0\); under the modeled assumption conditional on the observed data.

One may interpret

\[ \int_A^B f_N(x) \, \mathrm dx \]

as probability that \(X_0\in(A,B)\), given the data and modeling assumptions.

It answers questions like:

"How close is \(\bar{X}\) likely to be to the true value \(X_0\)?"

"What’s the uncertainty of the estimate?"

Mean Square Width (Expected Squared Error)

The expected squared deviation is: \[ \sigma_N^2 = \int_{-\infty}^{\infty} (x - \bar{X})^2 \ f_N(x) \ \mathrm d x \]

which gives \[ \sigma_N=\sqrt{\frac{N-1}{N-3}}\;c_N \]

This is the corrected standard error. It has the property: \[ \mathbb{E}[(\bar{X} - X_0)^2]=\mathbb{E}[\sigma_N^2] \]

In such a way, \(\sigma_N\) properly accounts for small‑sample variability.

The correction factor \( \sqrt{(N{-}1)/(N{-}3)} \) arises from computing the second moment of the Student Distribution and accounts for extra variability in \(s\) due to estimating \(S\).

However, \(\sigma_N\) makes sense only for \(N>3\).

As for the probability density function \(f_N\) , it has sense for any integer \(N>1\).

Future Measurement Expectation

In this section, the additional question is considered:

What value should the Expert expect for a new similar independent measirement \((N{+}1)\) ?

Given the data and assuming the same measurement process, the predictive distribution for \(X_{N+1}\) , the conditional probability density \(g\) is expressed as follows:

\[ g(x)= \frac{1}{s \cdot \sqrt{1 + \frac{1}{N} }} \mathrm{Student}_{N-1}\left(\frac{x-\bar{X}}{s \cdot \sqrt{1 + \frac{1}{N}}} \right) \]

This distribution density reflects both the randomness of the next measurement and the uncertainty in estimating \(X_0\).

The expected deviation for the next measurement:

\[ \sqrt{\int_{-\infty}^{\infty} g(x) \ (x-\bar X)^2 \ \mathrm d x} = \sqrt{\frac{N-1}{N-3}} \cdot s \cdot \sqrt{1 + \frac{1}{N}} \]

It shows that the next measurement is expected to vary more then either the sample standard deviation \(s\) or the standard error \(c_N\), due to compounded uncertainty.

Use of this estimate helps avoid **overconfidence** in forecasting — especially with small \(N\) — and acknowledges that even “true value” \(X_0\) being fixed doesn’t guarantee low variance in the future data.

Practical Significance

The consideration above:

is defined within classical probability theory,

requires no specific ideology,

corrects the naive standard error \(\displaystyle \ c_N = \frac s{\sqrt N} \ \), which underestimates the uncertainty at small \(N\).

In the limit \(N \to \infty\), the Student's t-distribution converges to the normal distribution, and the naive standard error \(c_N\) becomes an increasingly accurate estimate of the uncertainty.

Table of comparison:

Quantity	Formula	Interpretation
\(c_N\)	\(\displaystyle\frac{s}{\sqrt{N}}\)	Naive standard error of the mean
\(\sigma_N\)	\(\displaystyle\sqrt{\frac{N-1}{N-3}} \cdot c_N\)	Corrected standard error accounting for small \(N\)
Naive Predictive SE	\(\displaystyle s \cdot \sqrt{1 + \frac{1}{N}}\)	Scale for \(g(t)\) (not the true variance)
Predictive spread	\(\displaystyle \sqrt{\frac{N-1}{N-3}} \cdot s \cdot \sqrt{1 + \frac{1}{N}}\)	Standard error of the estimate for the next measurement

Notably, the same correction factor \(\sqrt{\frac{N - 1}{N - 3}}\) appears both in the standard error of the sample mean \(\bar{X}\) and in the expected deviation of a future measurement \(X_{N+1}\). This factor arises due to the variance of the Student’s t-distribution with \(N-1\) degrees of freedom, and corrects for the additional uncertainty from estimating the unknown \(S\) with measured \(s\).

The predictive density \(g(t)\) is often presented using the naive scale \(s \cdot \sqrt{1 + \frac{1}{N}}\). Then, the actual root-mean-square deviation includes the same correction factor as the corrected estimate of the mean's standard error.

Originality

The formulas above are not original.

Various authors have discussed common misinterpretations and misleading uses related to qualification of precision of estimate of the mean value from the set of independent measurements ^[1]^[2]^[3]^[4]^[5]^[6].

We tried to compile the result in the most concise and compact, but still correct and internally consistent form.

Conclusion

This construction shows how statistical estimation can be framed as probabilistically coherent predictive inference, even within a non-Bayesian or fully deterministic worldview.

This framework enables Experts to translate vague or poorly-posed questions about "accuracy" or "confidence" into well-defined probabilistic statements — entirely within classical probability theory, and without requiring Bayesian priors.

All probabilistic statements here are conditional on the observed data and modeling assumptions, rather than arising from prior distributions.

Warning

Neither deduction nor proof of the formulas above is presented in this article.

However, Editor and ChatGPT made every effort to catch and to correct all possible mistakes, misprints.

If you see at least one mistake that is not yet corrected here, then, please let Editor know.

References

↑ ^{Jump up to: 1.0} ^1.1 https://pmc.ncbi.nlm.nih.gov/articles/PMC4742505/ Richard D Morey, Rink Hoekstra, Jeffrey N Rouder, Michael D Lee, Eric-Jan Wagenmakers. The fallacy of placing confidence in confidence intervals. Psychon Bull Rev. 2015 Oct 8;23:103–123. doi: 10.3758/s13423-015-0947-8 .. Interval estimates – estimates of parameters that include an allowance for sampling uncertainty – have long been touted as a key component of statistical analyses. There are several kinds of interval estimates, but the most popular are confidence intervals (CIs): intervals that contain the true parameter value in some known proportion of repeated samples, on average. The width of confidence intervals is thought to index the precision of an estimate; CIs are thought to be a guide to which parameter values are plausible or reasonable; and the confidence coefficient of the interval (e.g., 95 %) is thought to index the plausibility that the true parameter is included in the interval. We show in a number of examples that CIs do not necessarily have any of these properties, and can lead to unjustified or arbitrary inferences. ..
↑ ^{Jump up to: 2.0} ^2.1 https://pmc.ncbi.nlm.nih.gov/articles/PMC4877414/ Sander Greenland, Stephen J Senn, Kenneth J Rothman, John B Carlin, Charles Poole, Steven N Goodman, Douglas G Altman. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 2016 May 21;31:337–350. doi: 10.1007/s10654-016-0149-3 // Misinterpretation and abuse of statistical tests, confidence intervals, and statistical power have been decried for decades, yet remain rampant. A key problem is that there are no interpretations of these concepts that are at once simple, intuitive, correct, and foolproof. Instead, correct use and interpretation of these statistics requires an attention to detail which seems to tax the patience of working scientists. This high cognitive demand has led to an epidemic of shortcut definitions and interpretations that are simply wrong, sometimes disastrously so—and yet these misinterpretations dominate much of the scientific literature. In light of this problem, we provide definitions and a discussion of basic statistics that are more general and critical than typically found in traditional introductory expositions. Our goal is to provide a resource for instructors, researchers, and consumers of statistics whose knowledge of statistical theory and technique may be limited but who wish to avoid and spot misinterpretations. We emphasize how violation of often unstated analysis protocols (such as selecting analyses for presentation based on the P values they produce) can lead to small P values even if the declared test hypothesis is correct, and can lead to large P values even if that hypothesis is incorrect. We then provide an explanatory list of 25 misinterpretations of P values, confidence intervals, and power. We conclude with guidelines for improving statistical interpretation and reporting.
↑ ^{Jump up to: 3.0} ^3.1 https://link.springer.com/article/10.3758/s13423-015-0947-8 — *Psychonomic Bulletin & Review*, Oct 2015 “Confidence intervals are thought to index the precision of an estimate… CIs do not necessarily have any of these properties and thus cannot be used uncritically in this way.” Why it’s relevant: highlights the error of treating CIs as direct measures of probability about parameters.
↑ ^{Jump up to: 4.0} ^4.1 https://www.frontiersin.org/articles/10.3389/fpsyg.2022.948423/full — *Frontiers in Psychology*, 2022 “It is given as interpretation … that any value within the 95% confidence interval could reasonably be the true value … This is a very common problem and results in ‘confusion intervals.’” Why it’s relevant: shows the widespread nature of this misunderstanding.
↑ ^{Jump up to: 5.0} ^5.1 https://pubmed.ncbi.nlm.nih.gov/27256121 — *Eur J Epidemiol*, Apr 2016 “There are no interpretations … that are at once simple, intuitive, correct, and foolproof … users routinely misinterpret them (e.g. interpreting 95% CI as ‘there is a 0.95 probability that the parameter is contained in the CI’).” Why it’s relevant: authoritative critique on core misinterpretations.
↑ https://arxiv.org/abs/1807.06217 — *arXiv*, Jul 2018 “The so‑called ‘confidence curve’ … may assign arbitrarily low non‑zero probability to the true parameter; thus it is a misleading representation of uncertainty.” Why it’s relevant: supplies theoretical foundation for careful formulation of \(f_N(x)\).

Keywords

«Bayesian ideology», «Bayesian statistics», «Central Limit Theorem» (CLT), «ChatGPT», «Confidence interval», «Credible interval», «Duration5», «Expected deviation», «Expectation and variance», «Frequentist ideology», «Independence and conditional probability», «Law of Large Numbers» (LLN), «Maximum likelihood estimation» (MLE), «Mean square deviation», «Mean value», «Normal distribution», «Probability», «Probability Density Function», «Random variable», «Sampling distributions», «Standard deviation», «Standard Error», «Student Distribution», «Theory of Probability», «Variance»,

[folk2015-1] {Jump up to: 1.0} ^1.1 https://pmc.ncbi.nlm.nih.gov/articles/PMC4742505/ Richard D Morey, Rink Hoekstra, Jeffrey N Rouder, Michael D Lee, Eric-Jan Wagenmakers. The fallacy of placing confidence in confidence intervals. Psychon Bull Rev. 2015 Oct 8;23:103–123. doi: 10.3758/s13423-015-0947-8 .. Interval estimates – estimates of parameters that include an allowance for sampling uncertainty – have long been touted as a key component of statistical analyses. There are several kinds of interval estimates, but the most popular are confidence intervals (CIs): intervals that contain the true parameter value in some known proportion of repeated samples, on average. The width of confidence intervals is thought to index the precision of an estimate; CIs are thought to be a guide to which parameter values are plausible or reasonable; and the confidence coefficient of the interval (e.g., 95 %) is thought to index the plausibility that the true parameter is included in the interval. We show in a number of examples that CIs do not necessarily have any of these properties, and can lead to unjustified or arbitrary inferences. ..

[guide2016-2] {Jump up to: 2.0} ^2.1 https://pmc.ncbi.nlm.nih.gov/articles/PMC4877414/ Sander Greenland, Stephen J Senn, Kenneth J Rothman, John B Carlin, Charles Poole, Steven N Goodman, Douglas G Altman. Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Eur J Epidemiol. 2016 May 21;31:337–350. doi: 10.1007/s10654-016-0149-3 // Misinterpretation and abuse of statistical tests, confidence intervals, and statistical power have been decried for decades, yet remain rampant. A key problem is that there are no interpretations of these concepts that are at once simple, intuitive, correct, and foolproof. Instead, correct use and interpretation of these statistics requires an attention to detail which seems to tax the patience of working scientists. This high cognitive demand has led to an epidemic of shortcut definitions and interpretations that are simply wrong, sometimes disastrously so—and yet these misinterpretations dominate much of the scientific literature. In light of this problem, we provide definitions and a discussion of basic statistics that are more general and critical than typically found in traditional introductory expositions. Our goal is to provide a resource for instructors, researchers, and consumers of statistics whose knowledge of statistical theory and technique may be limited but who wish to avoid and spot misinterpretations. We emphasize how violation of often unstated analysis protocols (such as selecting analyses for presentation based on the P values they produce) can lead to small P values even if the declared test hypothesis is correct, and can lead to large P values even if that hypothesis is incorrect. We then provide an explanatory list of 25 misinterpretations of P values, confidence intervals, and power. We conclude with guidelines for improving statistical interpretation and reporting.

[r1-3] {Jump up to: 3.0} ^3.1 https://link.springer.com/article/10.3758/s13423-015-0947-8 — *Psychonomic Bulletin & Review*, Oct 2015 “Confidence intervals are thought to index the precision of an estimate… CIs do not necessarily have any of these properties and thus cannot be used uncritically in this way.” Why it’s relevant: highlights the error of treating CIs as direct measures of probability about parameters.

[r3-4] {Jump up to: 4.0} ^4.1 https://www.frontiersin.org/articles/10.3389/fpsyg.2022.948423/full — *Frontiers in Psychology*, 2022 “It is given as interpretation … that any value within the 95% confidence interval could reasonably be the true value … This is a very common problem and results in ‘confusion intervals.’” Why it’s relevant: shows the widespread nature of this misunderstanding.

[r4-5] {Jump up to: 5.0} ^5.1 https://pubmed.ncbi.nlm.nih.gov/27256121 — *Eur J Epidemiol*, Apr 2016 “There are no interpretations … that are at once simple, intuitive, correct, and foolproof … users routinely misinterpret them (e.g. interpreting 95% CI as ‘there is a 0.95 probability that the parameter is contained in the CI’).” Why it’s relevant: authoritative critique on core misinterpretations.

[r5-6] ttps://arxiv.org/abs/1807.06217 — *arXiv*, Jul 2018 “The so‑called ‘confidence curve’ … may assign arbitrarily low non‑zero probability to the true parameter; thus it is a misleading representation of uncertainty.” Why it’s relevant: supplies theoretical foundation for careful formulation of \(f_N(x)\).

[1]

[2]

[3]

[4]

[5]

[6]

@@ Line 95: / Line 95: @@
 Function \(f_N\)
-acts as a [[confidence distribution density]] (or [[data-driven predictive density]]) for value \(X0\);
+acts as a [[confidence distribution density]] (or [[data-driven predictive density]]) for value \(X_0\);
 under the modeled assumption conditional on the observed data.
-<!--
-This wording avoids committing to “belief” terminology and avoids sounding Bayesian.
-is  [[confidence distribution density]]
-([[predictive belief distribution]]) over the possible values of \(X_0\), conditional on the data.
-!-->
 One may interpret
@@ Line 165: / Line 159: @@
 This distribution density reflects both the randomness of the next measurement and the uncertainty in estimating \(X_0\).
-The expected deviation for the next measurement:
+The [[expected deviation]] for the next measurement:
 \[
@@ Line 203: / Line 197: @@
 appears both in the standard error of the sample mean \(\bar{X}\)
 and in the expected deviation of a future measurement \(X_{N+1}\).
-This factor arises due to the variance of the Student’s t-distribution with \(N-1\) degrees of freedom, and corrects for the additional uncertainty from estimating \(S\) with \(s\).
+This factor arises due to the variance of the Student’s t-distribution with \(N-1\) degrees of freedom, and corrects for the additional uncertainty from estimating the unknown \(S\) with measured \(s\).
-While the predictive density \(g(t)\) is often presented using the naive scale
+The predictive density \(g(t)\) is often presented using the naive scale
-\(s \cdot \sqrt{1 + \frac{1}{N}}\),
+\(s \cdot \sqrt{1 + \frac{1}{N}}\).
-the actual root-mean-square deviation includes the same factor as the corrected estimate of the mean's standard error.
+Then, the actual root-mean-square deviation includes the same correction factor as the corrected estimate of the mean's standard error.
 ==Originality==
@@ Line 305: / Line 300: @@
 «[[Credible interval]]»,
 «[[Duration5]]»,
+«[[Expected deviation]]»,
 «[[Expectation and variance]]»,
 «[[Frequentist ideology]]»,
@@ Line 321: / Line 317: @@
 «[[Student Distribution]]»,
 «[[Theory of Probability]]»,
+«[[Variance]]»,
 [[Category:Bayesian ideology]]
@@ Line 343: / Line 340: @@
 [[Category:Student Distribution]]
 [[Category:Theory of Probability]]
+[[Category:Variance]]