ITMO University, Russia
Cite this as
Pimenov AV, Nazarenko NM, Efimova VA. A Review: Teeth Numbering and Classification Methods on the OPG Image. Glob J Medical Clin Case Rep. 2025:12(1):011-017. Available from: 10.17352/2455-5282.000192Copyright License
© 2025 Pimenov AV, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.Subject of research: A review of the existing teeth numbering and classification methods on images is presented. The available architectural peculiarities and their practical importance are considered. The best solutions comparison and identification in these areas were carried out.
Method: To evaluate the teeth numbering and classification methods results, the following quality metrics were selected: IoU, average precision, and accuracy, as well as other metrics that were given in the reviewed studies. Also, attention is paid to the data pre-processing, the image sources, and the amount of data used to train and test the models. The advantages and disadvantages of each solution are considered.
Main results: Based on the study results, the best algorithm for regression-based tooth numbering was identified. This method allows us to carry out qualitative teeth segmentation. The advantage of this approach besides the superiority in metrics values is that it is capable of finding out a position of missing teeth.
Practical relevance: This research can be useful for specialists in the field of machine learning technologies, as well as physicians conducting research in the field of medical process automation. The results of this work may be useful in the dental X-ray image recognition system implementation in medical assistants.
These days it can already be argued that the era of Industry 4.0 [1] is coming to an end. It includes automation, completion of globalization, and laying the groundwork for Industry 5.0 [2] namely the synergy between the machine and the human. The most striking manifestation of this process is the widespread adoption of machine learning technologies, in particular neural networks, in manufacturing, entertainment, and analytics [3].
However, certain areas place high demands on the reliability of systems, such as critical infrastructure management [4] and medicine [5]. In the field of healthcare, there are already many studies and solved applications using machine learning in medical image analysis [6], monitoring and analysis of patient vital signs [7,8], pathology detection [9], assisting the doctor [10], ensuring These days it can already be argued that the era of Industry 4.0 [1] is coming to an end. It includes automation, completion of globalization, and laying the groundwork for Industry 5.0 [2] namely the synergy between the machine and the human. The most striking manifestation of this process is the widespread adoption of machine learning technologies, in particular neural networks, in manufacturing, entertainment, and analytics [3].
However, certain areas place high demands on the reliability of systems, such as critical infrastructure management [4] and medicine [5]. In the field of healthcare, there are already many studies and solved applications using machine learning in medical image analysis [6], monitoring and analysis of patient vital signs [7,8], pathology detection [9], assisting the doctor [10], ensuring timely prevention [11], predicting epidemiological situations [7], digitizing medical documents [12], ensuring patient appointments [10]. Convolutional Neural Networks (CNN) [13] have been particularly effective in detecting various pathologies on X-rays [14-16]. However, not all areas of medicine have been advanced enough in the introduction of automation into work processes, including dentistry.
There is a historical development and implementation of methods in automatic analysis of dental images, ranging from mathematical methods (threshold methods, segmentation using an active counter, use of levels) to machine learning methods (clustering, regression, neural networks) [17,18]. The latest trend is to use CNN, which provides the best results, but they require large datasets of medical images, which makes it difficult to study this topic. Neural networks have not been widely used in dentistry, despite the existence of some studies and commercial proposals, such as «diagnocat»1 or «denti.ai»2. To implement such systems, datasets from thousands of images are used to identify a specific dental anomaly (caries, periapical lesions) [19,20].
-------------------------
1https://diagnocat.ru/
2https://www.denti.ai/
To solve this problem, automatic data markup systems such as OdontoAI [21] have been developed. This system was created to label OPG images of teeth, which was supposed to simplify research in dentistry by eliminating some of the work on marking data. However, this topic has not been properly developed due to several factors: the complexity of dental image automatic analysis [18], the need to use specific labeling, and the need for a quality dataset for training. Accordingly, data retrieval and labeling is one of the open problems in dental research.
In this review article, we will analyze the studies that have already been conducted and compare the methods they use and the results obtained. The research using neural networks is of the greatest interest to us, but we will also consider older studies from among the most peer-reviewed. It should be noted that there are 3 main tasks of neural networks in dentistry: teeth detection, teeth classification, and detection of dental anomalies (caries, pulpitis, crowns, etc.). However, it is quite difficult to divide the above tasks since to classify a tooth on a dental X-ray image, it is necessary to segment it, so we will consider all the studies in one section. Also, we will analyze the existing commercial proposals on the market, their quality of work, and their applicability in practice. As a result of the analysis, we will highlight the dental image processing features, as well as the most promising methods. In addition, we are interested in the direction of further research in this area.
The X-ray image analysis of the jaw consists of teeth segmentation, teeth classification, and identification of anomalies. The most interesting are universal methods that allow combining the solution of these tasks or solving these tasks in an integrated way.
The analysis methods can be divided into three conventional groups. These are mathematical methods based on various kinds of operations, methods based on neural networks, and current commercial systems for dental X-ray image analysis. This study will not consider mathematical methods because they are obsolete and inferior to other more modern approaches based on machine learning. Such methods can only be useful for extracting useful information about pre-processing and post-processing data.
The research analysis will note the data sources, their format, and dataset size. The most popular is the use of OPG x-rays as the simplest, but also quite effective data source. As a data source can be used X-Ray image:
Based on the results of each study we will highlight the following parameters: architecture, quality metrics, data type, size of the dataset used, information about the data source, as well as our comments on the study. If any information is unavailable, we will put ‘*’ in place of the value. All this information will be saved in Table 1.
In this chapter, we will look at methods for analyzing dental X-ray images using neural networks. Additionally, these methods may contain the mathematical operations described above.
In [21] the authors tried to determine the best architecture from some of their models using IOU and AP metrics [22]. As a result, from the list of the following architectures: HTC, DetectoRS, ResNeSt Cascade R-CNN, Cascade R-CNN with DCN, ResNeSt Mask R-CNN, Cascade R-CNN, Mask R-CNN. As a result of the analysis of quality metrics, the HTC architecture with a DCN based on ResNeXt101-64x4d was selected. It has AP50 of the order of 0.983 (AP75 0.958) and IoU of 0.802, while the other architectures yielded worse – AP50 of 0.918 to 0.982 and IOU of 0.745 to 0.780. A total of 150 epochs were used to train each architecture. The training dataset contains 3,600 images. The validation dataset contains 400 images. Authors separate the evaluation of deciduous teeth and molars, as the model gives results with a large difference for them. So, for example, the IoU is 0.83 for permanent teeth and 0.69 for deciduous, which is a significant difference. The study presents a table with metrics for different types of teeth and their location. In addition, the paper [21] presents a study of tooth numbering. The average AP50 for all tooth types is about 0.7 and the accuracy is 0.98.
The following metrics were obtained using this architecture: an accuracy of 0.9945, compared to the experts’ estimate of 0.9998.
In general, this approach works well for images where all teeth are present without omissions. However, if even one tooth is missing, there is a problem with its potential location in the image, as this method does not take into account the segmentation of areas where the tooth should be located.
This paper [26] investigates a way to segment teeth. This is a relatively old study because it dates back to 2017 and uses such an obsolete architecture as AlexNet, albeit with some modifications. The dataset consists of 100 OPG images. This approach feature is that the authors segmented teeth not in the entire image, but only on certain selected fragments, which were obtained by calculating the location of the mouth slit jaw (at an altitude of 40-60% on the image). The authors claim that this approach has a high performance and an AP50 index of about 0.93 for all types of teeth. Using this approach, it is necessary either to normalize the data or to use only one image source, since the algorithm correctness depends on the correct detection of the slit jaw. In addition, this approach, like the previous one, is not capable of segmenting missing teeth.
Another approach was demonstrated in the article [27]. In that study, the problem of tooth detection and numbering was solved based on regression [28] and CNN for tooth detection was proposed. In general, this approach can be described as follows:
The dataset consisted of 818 OPG images obtained from 4 different X-ray machines: Osstem Implant, HDX WILL, PointNix, and General.
ResNet 18-based and DLA models were used. DLA is a deep aggregation network architecture that has hierarchical structures of combining layers that enhance object recognition. The best performance was obtained with the ResNet 18-based model. The segmentation metrics that the authors managed to achieve are AP50 0.91 and IoU 0.84. As for the classification of teeth, the following metrics were obtained: Precision is about 0.997, and Recall is 0.972.
It is worth noting the practical significance of this approach. It allows you to segment and number teeth with high accuracy on 2D images, such as OPG images. This method can process missing teeth.
An important disadvantage is the dependence on image normalization, as well as the need to apply multiple machine learning algorithms because we need to solve many tasks: (1) to find the starting point (the authors used the machine learning algorithm with MSE error), (2) to train the regression algorithm (this will work only with images of the same scale), and (3) apply CNN accordingly.
A simpler approach for segmentation and teeth numbering is described in the study [29]. It presents the use of modern CNNs for processing panoramic images (OPG), namely such models as: Mask R-CNN, PANET, and ResNet. PANet is the path aggregation network used in YOLOv4. The dataset contains 778 OPG images from various sources. The best score with a small margin was obtained by the PANet model, so we will consider its metrics. For segmentation, we got the following metrics: IoU 0.71 and AP50 0.97. For classification, we got the following scores: precision of 0.97, f1 score of 0.92, recall of 0.89, and precision of 0.98. According to study results, modern models can number teeth without additional processing based on a 2D image such as the OPG.
This study does not provide any quality metrics for the performance of the algorithm; however, classified images are present. We determined that the information presented in the paper, namely image classification, is correct. However, the sample is quite small – only 40 images corresponding to 10 people.
In addition to dental segmentation, there are studies in the field of maxillofacial segmentation on X-rays. For example, in a study [30], the authors created a new architecture called EED-Net. It is an encoder-decoder network based on U-Net architectures [31], FCN-8 decoding method, and modified Inception-ResNet blocks [32].
For EED-Net training, 2602 panoramic images excluding caries and hypoplasia were used. The accuracy and IoU metrics were taken as an evaluation of the architecture. The values are: precision 0.993 and IoU 0.983 with rather high performance (41 images per second).
In this subsection, we will look at two main solutions in the dental X-ray analysis market. These systems differ from the solutions presented above in that they solve the complex analysis problem and have a larger dataset for training.
The research [19] is directly related to the Diagnocat system because it examines the algorithm it uses and its usefulness in practice as a dental assistant. Cone Beam Computed Tomography (CBCT) images are used as the source. The data themselves were obtained using three different instruments: Ortophos XG, Carestream Health, and PaX-i3D Smart. As the authors point out, the devices of different manufacturers have different characteristics in addition to all other settings, which leads to the need to normalize the data. The system itself consists of many different models: ROI (region of interest) localization, Tooth localization and numeration, Periodontitis module, Caries localization module, and Periapical lesion localization module. The general principle of this system is the following:
Sensitivity and specificity [13] were used as quality metrics for each of the classes. Averaging the results, we obtain the following metrics: sensitivity 0.78 and specificity 0.93. These results show that this system can be used to solve real problems and help the dentist.
The Diagnocat has a separate module whose task is to determine the tooth decay degree [32]. A convolutional network with attention [34] is used to detect if the caries are localized near the crown of the tooth. Horizontal alignment of the teeth is used to improve the convolutional network. According to the authors, this approach gives a better score compared to the simple CNN. An F1 measure is presented as a quality metric, which is 0.74 compared to a simple CNN architecture quality of 0.58.
In this work, the authors used 153 images as a test piece and 2800 images for training, as well as 1100 images without periapical lesions. The model used in this study was the U-Net the lesions obtained with CLCT. An accuracy of 0.99 and IoU of 0.93 was provided as a quality metric.
Noteworthy, the studied method is directly related to the DiagnoCat system, which was discussed earlier. This method does not use any additional pre-processing of dental images. However, a big dataset allows us to achieve high metrics.
The next commercial system is the DETECT system [35], which can numerate teeth according to FBI notation, as well as recommend certain treatments, namely periapical lesion therapy, fillings, molar canal treatment (RCT), and surgical tooth removal. The authors note that the error rate regarding the need for surgery for tooth removal is 0.21, and this system is designed to reduce that percentage.
The quality scores for the DENTECT system are IoU, which is 0.862, and the AP50, which is 0.894. These metrics are worse than those of the commercial Diagnocat system. One of the reasons is the smaller training dataset compared to the Diagnocat dataset.
Current research shows that it is currently impossible to create an autonomous AI that will self-analyze [36-38]. Most of the errors are due to various anomalies and the current imperfection of analysis technologies. However, the studies above suggest an assistant who will be able to perform a fairly accurate preliminary analysis where the doctor only needs to check and if necessary correct the diagnosis. In this way, the doctor is relieved of the work of analyzing routine clinical cases as well as the routine work with reports. This increases the efficiency of the physician’s work and reduces the number of human errors.
The neural models allow achieving quality comparable to experienced experts in the field, but they require large datasets, which is a serious problem because most of the data used in the studies reviewed are proprietary, and open sources are not available. Nevertheless, there are only a few common datasets that have been compiled or supplemented by previous researchers [21]. In addition, data from one or more devices is often used because in this case there is no need for image normalization. However, in some studies, authors still prefer to normalize data to the format they need, but only if the algorithm needs to work. No one has investigated the issue of pre-processing of X-ray images to improve the quality of metrics. One can say normalization of CLCT and OPG images is an open task now.
The analysis found that studies that used large datasets of 1000 images using the U-Net model received the highest score. There are also studies that have achieved high scores using additional methods, such as using regression [27] without large datasets. However, such methods usually require a certain input format, thus, normalization is needed.
It is worth noting that the anomaly classification is a separate task. This is due to the anomaly detection complexity and the need for additional algorithms to analyze images of teeth with caries.
In some modern studies, in addition to neural networks, additional methods have been used. These can be different types of transformations, as well as classical ML. This has helped to achieve a better result [28]. In general, such hybrid approaches with data preprocessing for quality improvement and neural networks for data processing show the highest scores. This shows the need to implement preprocessing steps since images can be taken on different devices with different characteristics.
Based on the information shown in Table 1, we propose a new architecture that combines several best practices. It will use more advanced models instead of outdated models. It is displayed in Figure 1.
For tooth detection and segmentation, we propose to use a regression-based method and segmentation model BEiT [26]. This model is more advanced3 than U-NET. The next step is the detection of hard tissue pathologies of the oral cavity: caries, periodontitis, missing teeth, etc. In the case of caries, we need to segment the affected areas of the teeth. For this purpose, we can again use the segmentation model proposed earlier. To detect whether a tooth is diseased or not, we can use classification models. Diagnocat uses ResNeXt but there is a model with better performance for example ViT [27]. When tested on the ImageNet IoU dataset, ResNeXt-based models yield 86.4% and ViT models 90.9%. Then we need to aggregate the obtained results into a report e.g. a tooth formula in some notation.
------------------------
3https://paperswithcode.com/sota/semantic-segmentation-on-ade20k-val
The above-proposed method combines the studies of [19] and [27]. Currently, these studies contain results using obsolete convolution-based models. Transformers perform better on a wide variety of tasks. Using the best methods that have been considered and advanced segmentation and classification models, we plan to get the best result. In this way, we will be able to achieve a result that will create a quality dental assistant.
In general, the results of recent studies show that it has already been possible to achieve the desired quality, which will satisfy even such a demanding field as medicine. However, there are only a few systems that allow comprehensive image analysis. These systems rely on large datasets to train CNN models on a large number of images and obtain high metrics. We note the possibility of developing a better architecture for analyzing dental X-ray images using some of the above methods with a smaller dataset.
Noteworthy, is that the task such as dental formula generation is feasible only in a modular architecture consisting of several neural networks since it is required to perform segmentation, numbering, and classification of dental anomalies. In addition, anomaly detection requires separate modules, which also complicates the task.
In future works, we plan to develop an open system TANALEETH, whose purpose will be the analysis of OPG images and the generation of dental formulas in different notations. Its architecture is shown in image 1. Our next goal is to investigate the issue of normalization of OPG images obtained from different sources and defects.
PTZ: We're glad you're here. Please click "create a new query" if you are a new visitor to our website and need further information from us.
If you are already a member of our network and need to keep track of any developments regarding a question you have already submitted, click "take me to my Query."