A Machine Learning Model for Predicting Hepatocellular Carcinoma Risk in Patients With Chronic Hepatitis B

Hye Won Lee; Hwiyoung Kim; Taeyun Park; Soo Young Park; Young Eun Chon; Yeon Seok Seo; Jae Seung Lee; Jun Yong park; Do Young Kim; Sang Hoon Ahn; Beom Kyung Kim; Seung Up Kim


Liver International. 2023;43(8):1813-1821. 

In This Article

Abstract and Introduction


Background: Machine learning (ML) algorithms can be used to overcome the prognostic performance limitations of conventional hepatocellular carcinoma (HCC) risk models. We established and validated an ML-based HCC predictive model optimized for patients with chronic hepatitis B (CHB) infections receiving antiviral therapy (AVT).

Methods: Treatment-naïve CHB patients who were started entecavir (ETV) or tenofovir disoproxil fumarate (TDF) were enrolled. We used a training cohort (n = 960) to develop a novel ML model that predicted HCC development within 5 years and validated the model using an independent external cohort (n = 1937). ML algorithms consider all potential interactions and do not use predefined hypotheses.

Results: The mean age of the patients in the training cohort was 48 years, and most patients (68.9%) were men. During the median 59.3 (interquartile range 45.8–72.3) months of follow-up, 69 (7.2%) patients developed HCC. Our ML-based HCC risk prediction model had an area under the receiver-operating characteristic curve (AUC) of 0.900, which was better than the AUCs of CAMD (0.778) and REAL B (0.772) (both p < .05). The better performance of our model was maintained (AUC = 0.872 vs. 0.788 for CAMD and 0.801 for REAL B) in the validation cohort. Using cut-off probabilities of 0.3 and 0.5, the cumulative incidence of HCC development differed significantly among the three risk groups (p < .001).

Conclusions: Our new ML model performed better than models in terms of predicting the risk of HCC development in CHB patients receiving AVT.


Hepatitis B virus (HBV) is the main cause of chronic liver disease, which affects approximately 250 million people globally.[1] Progression is mediated by viral replication.[2] Antiviral therapy (AVT) changes the natural course of chronic HBV infection.[3,4] Strong and persistent HBV suppression by drugs with high-genetic barriers, such as entecavir (ETV) and tenofovir disoproxil fumarate (TDF), reduce the risk of cirrhosis and hepatocellular carcinoma (HCC).[5,6] However, antivirals do not completely eliminate the risk of HCC.[6–9] The annual incidence of HCC was 0.01%–1.4% in non-cirrhotic patients, and 0.9%–5.4% in those with cirrhosis taking ETV or TDF.[5]

Predicting the risk of HCC development is important for the surveillance and management of patients with chronic hepatitis B (CHB). Various HCC risk prediction systems have been devised. The modified REACH-B, AASL, RESCUE-B, PAGE-B, modified PAGE-B (mPAGE-B) and CAMD models were developed especially for patients on antivirals.[10–16] However, despite extensive validation, some of previous HCC models, e.g., PAGE-B and mPAGE-B, included limited numbers of baseline parameters with modest accuracy.

To overcome these problems, machine learning (ML) can be applied to predict the risk of HCC. ML algorithms have been compared to conventional regression analysis for predicting HCC development.[17] In a recent cohort of 442 patients with compensated cirrhosis study,[18] ML showed significantly better risk stratification performance than other models. Another study showed that a deep learning model performed better than other models for predicting HCC development in patients with HBV-related cirrhosis. ML algorithms consider all potential interactions and do not use predefined hypotheses.[19] Thus, the risk of overlooking unexpected predictor variables is reduced. In addition, ML can incorporate new clinical data, so the algorithms can be continuously updated and optimized with minimal oversight. Although traditional models have similar properties, ML methods can use even more parameters and give more accurate results than traditional ones.[20]

We established an ML-based HCC prediction model for patients with CHB treated with ETV or TDF, and validated its performance in independent cohort. We then compared the performance of our model with the mPAGE-B and CAMD models.