In the fields of meal-assisting robotics and human–robot interaction (HRI), real-time and accurate mouth pose estimation is critical for ensuring interaction safety and improving user experience. The complexity arises from the diverse opening degrees of mouths, variations in orientation, and external factors such as lighting conditions and occlusions, which pose significant challenges for real-time and accurate posture estimation of mouths. In response to the above-mentioned issues, this paper proposes a novel method for point cloud fitting and posture estimation of mouth opening degrees (FP-MODs). The proposed method leverages both RGB and depth images captured from a single viewpoint, integrating geometric modeling with advanced point cloud processing techniques to achieve robust and accurate mouth posture estimation. The innovation of this work lies in the hypothesis that different states of mouth openings can be effectively described by distinct geometric shapes: closed mouths are modeled by spatial quadratic surfaces, half-open mouths by spatial ellipses, and fully open mouths by spatial circles. Then, based on these hypotheses, we developed algorithms for fitting geometric models to point clouds obtained from mouth regions, respectively. Specifically, for the closed mouth state, we employ an algorithm based on least squares optimization to fit a spatial quadratic surface to the point cloud data. For the half-open or fully open mouth states, we combine inverse projection methods with least squares fitting to model the contour as a spatial ellipse and circle, respectively. Finally, to evaluate the effectiveness of the proposed FP-MODs method, extensive actual experiments were conducted under varying conditions, including different orientations and various types of mouths. The results demonstrate that the proposed FP-MODs method achieves high accuracy and robustness. This study can provide a theoretical foundation and technical support for improving HRI and food delivery safety in the field of robotics.