Key Concepts in AI Safety: Reliable Uncertainty Quantification in Machine Learning
Analyzing the core challenges of enabling machine learning systems to "know what they do not know," the difficulties of distribution shift, quantification methods, and practical applications, to provide theoretical and technical references for safe deployment.
Detail
Published
23/12/2025
Key Chapter Title List
- Introduction
- The Challenge of Reliably Quantifying Uncertainty
- Understanding Distribution Shift
- Accurately Characterizing Uncertainty
- Existing Methods for Uncertainty Quantification
- Deterministic Methods
- Model Ensembles
- Conformal Prediction
- Bayesian Inference
- Practical Considerations for Uncertainty Quantification
- Outlook
Document Introduction
The rapid development of machine learning research over the past decade has given rise to systems with astonishing capabilities but whose reliability is often criticized. The problem of inconsistent performance in such systems poses significant challenges for their deployment in real-world scenarios. Building machine learning systems that "know what they don't know"—that is, systems capable of identifying and responding to scenarios where they are prone to error—has become an intuitive path to solving this problem. This goal is technically defined as "uncertainty quantification," which is also an open and widely followed research topic in the field of machine learning.
As the fifth research report in the "Artificial Intelligence Safety" series, this report systematically introduces the working principles, core difficulties, and future prospects of uncertainty quantification. The report first explains the key concept of "calibration," which means that a machine learning model's predictive uncertainty should match its probability of prediction error. It illustrates three model states—underconfident, well-calibrated, and overconfident—through calibration curves and uses medical image diagnosis as an example to demonstrate the practical value of a well-calibrated system.
Distribution shift is a core real-world challenge for uncertainty quantification. It refers to the difference between the data distribution encountered after model deployment and that during the training phase. This difference is difficult to predict, detect, and precisely define, leading to well-calibrated models that perform well in the lab potentially failing in complex real-world environments. Simultaneously, the probabilistic outputs of traditional machine learning models have inherent flaws; they cannot guarantee a correlation with actual accuracy rates and struggle to express "none of the above" unknown scenarios, further exacerbating the difficulty of quantification.
The report details four mainstream uncertainty quantification methods: deterministic methods, model ensembles, conformal prediction, and Bayesian inference. It analyzes the technical principles, advantages, and limitations of each method. Deterministic methods guide models through training to exhibit high uncertainty for non-training data but struggle to cover all real-world complex scenarios. Model ensembles improve accuracy and uncertainty estimation by combining predictions from multiple models but lack a universal validation mechanism. Conformal prediction offers mathematical reliability guarantees but relies on the assumption of no distribution shift. Bayesian inference provides a theoretically rigorous framework but is difficult to implement precisely in modern machine learning models.
At the practical application level, uncertainty quantification methods can serve as "add-on components" to standard training pipelines, adding a safety layer to deployed systems. However, careful consideration must be given to human-computer interaction design to ensure human operators can effectively interpret and utilize uncertainty estimates. At the same time, it is essential to recognize that existing methods are not universal solutions; one must not develop false confidence from using uncertainty estimates, and system design must fully account for unknown risks.
Although reliably quantifying uncertainty faces fundamental challenges and achieving a completely deterministic "knowing what one doesn't know" may never be possible, research in related fields has made significant progress in improving the reliability and robustness of machine learning systems. In the future, this work is expected to shift from basic research to practical engineering challenges, playing a key role in enhancing the safety, reliability, and interpretability of AI systems such as large language models.