A Machine Learning Model for Predicting Hepatocellular Carcinoma Risk in Patients With Chronic Hepatitis B

Hye Won Lee; Hwiyoung Kim; Taeyun Park; Soo Young Park; Young Eun Chon; Yeon Seok Seo; Jae Seung Lee; Jun Yong park; Do Young Kim; Sang Hoon Ahn; Beom Kyung Kim; Seung Up Kim

Disclosures

Liver International. 2023;43(8):1813-1821.

Abstract and Introduction

Abstract

Background: Machine learning (ML) algorithms can be used to overcome the prognostic performance limitations of conventional hepatocellular carcinoma (HCC) risk models. We established and validated an ML-based HCC predictive model optimized for patients with chronic hepatitis B (CHB) infections receiving antiviral therapy (AVT).

Methods: Treatment-naïve CHB patients who were started entecavir (ETV) or tenofovir disoproxil fumarate (TDF) were enrolled. We used a training cohort (n = 960) to develop a novel ML model that predicted HCC development within 5 years and validated the model using an independent external cohort (n = 1937). ML algorithms consider all potential interactions and do not use predefined hypotheses.

Results: The mean age of the patients in the training cohort was 48 years, and most patients (68.9%) were men. During the median 59.3 (interquartile range 45.8–72.3) months of follow-up, 69 (7.2%) patients developed HCC. Our ML-based HCC risk prediction model had an area under the receiver-operating characteristic curve (AUC) of 0.900, which was better than the AUCs of CAMD (0.778) and REAL B (0.772) (both p < .05). The better performance of our model was maintained (AUC = 0.872 vs. 0.788 for CAMD and 0.801 for REAL B) in the validation cohort. Using cut-off probabilities of 0.3 and 0.5, the cumulative incidence of HCC development differed significantly among the three risk groups (p < .001).

Conclusions: Our new ML model performed better than models in terms of predicting the risk of HCC development in CHB patients receiving AVT.

Introduction

Hepatitis B virus (HBV) is the main cause of chronic liver disease, which affects approximately 250 million people globally.^[1] Progression is mediated by viral replication.^[2] Antiviral therapy (AVT) changes the natural course of chronic HBV infection.^[3,4] Strong and persistent HBV suppression by drugs with high-genetic barriers, such as entecavir (ETV) and tenofovir disoproxil fumarate (TDF), reduce the risk of cirrhosis and hepatocellular carcinoma (HCC).^[5,6] However, antivirals do not completely eliminate the risk of HCC.^[6–9] The annual incidence of HCC was 0.01%–1.4% in non-cirrhotic patients, and 0.9%–5.4% in those with cirrhosis taking ETV or TDF.^[5]

Predicting the risk of HCC development is important for the surveillance and management of patients with chronic hepatitis B (CHB). Various HCC risk prediction systems have been devised. The modified REACH-B, AASL, RESCUE-B, PAGE-B, modified PAGE-B (mPAGE-B) and CAMD models were developed especially for patients on antivirals.^[10–16] However, despite extensive validation, some of previous HCC models, e.g., PAGE-B and mPAGE-B, included limited numbers of baseline parameters with modest accuracy.

To overcome these problems, machine learning (ML) can be applied to predict the risk of HCC. ML algorithms have been compared to conventional regression analysis for predicting HCC development.^[17] In a recent cohort of 442 patients with compensated cirrhosis study,^[18] ML showed significantly better risk stratification performance than other models. Another study showed that a deep learning model performed better than other models for predicting HCC development in patients with HBV-related cirrhosis. ML algorithms consider all potential interactions and do not use predefined hypotheses.^[19] Thus, the risk of overlooking unexpected predictor variables is reduced. In addition, ML can incorporate new clinical data, so the algorithms can be continuously updated and optimized with minimal oversight. Although traditional models have similar properties, ML methods can use even more parameters and give more accurate results than traditional ones.^[20]

We established an ML-based HCC prediction model for patients with CHB treated with ETV or TDF, and validated its performance in independent cohort. We then compared the performance of our model with the mPAGE-B and CAMD models.

1 2 3 4

Next Section

Abstract and Introduction
Methods
Results
Discussion
References
Sidebar

Table 1. Baseline characteristics of the study population.
Variables	Training cohort n = 960	Validation cohort n = 1937	p value
Age, years	48.1 ± 12.2	48.7 ± 11.5	.233
Male sex	661 (68.9)	1141 (58.9)	<.001
Height, cm	65.7 ± 12.0	64.8 ± 11.0	.251
Weight, kg	166.1 ± 8.7	165.1 ± 9.0	.216
Body mass index, kg/m²	23.7 ± 3.1	23.7 ± 4.1	.904
Hypertension	179 (18.7)	100 (5.2)	<.001
Diabetes mellitus	134 (14.0)	93 (4.8)	<.001
Cirrhosis	425 (44.3)	485 (25.0)	<.001
HBeAg positivity	376 (39.2)	1076 (55.6)	<.001
HBV DNA, log₁₀ IU/mL	6.1 ± 1.8	5.2 ± 2.2	<.001
AST, IU/L	32.0 (23–104)	39.0 (27–68)	<.001
ALT, IU/L	39.5 (22–100)	44 (26–161)	<.001
Total bilirubin, mg/dL	0.7 ± 0.3	0.8 ± 0.5	<.001
Platelet count, 10³/μL	165 ± 70	171 ± 72	.008
Serum albumin, g/dL	4.0 ± 0.4	4.3 ± 1.6	<.001
PT INR	1.0 ± 0.1	0.9 ± 0.5	<.001
AFP, ng/mL	4.6 (2.8–10.2)	3.7 (2.4–7.2)	<.001
Sodium, mmol/L	139.8 ± 3.0	139.8 ± 3.2	.105
Serum creatinine, mg/dL	0.92 ± 1.19	0.52 ± 0.94	<.001
mPAGE	10.7 ± 3.7	10.3 ± 3.4	.006
CAMD	10.0 ± 5.5	8.5 ± 4.9	<.001
Type of antiviral			.004
ETV	455 (47.4)	1029 (53.1)
TDF	505 (52.6)	908 (46.9)
Antiviral switch	72 (7.5)	128 (6.6)	.526

Note: Values are means ± SDs, medians (Q1–Q3) or numbers (%).
Abbreviations: AFP, alpha-fetoprotein; ALT, alanine aminotransferase; AST, aspartate aminotransferase; ETV, entecavir; HBeAg, hepatitis B e-antigen; HBV, hepatitis B virus; PT INR, prothrombin time international normalized ratio; TDF, tenofovir disoproxil fumarate.

Table 2. Independent predictors of HCC development in the training cohort.
Variable	Univariate	Multivariate
Variable	p-value	HR	95% CI	p value
Age	<.001	1.054	1.038–1.071	<.001
Male sex	<.001	2.126	1.523–2.966	<.001
Cirrhosis	<.001	3.112	2.127–4.554	<.001
HBV DNA, log₁₀ IU/mL	.016	0.924	0.853–1.001	.053
AST, IU/L	.109
ALT, IU/L	.002	0.999	0.998–1.000	.024
Total bilirubin, mg/dL	<.001	1.111	0.818–1.508	.501
Platelet count, 10³/μL	<.001	0.995	0.992–0.998	.003
Serum albumin, g/dL	<.001	0.728	0.521–1.016	.062
PT INR	<.001	2.880	1.728–4.801	<.001
AFP, ng/mL	<.001	1.001	1.000–1.001	.064
HBeAg positivity	.003	0.942	0.694–1.277	.699

Abbreviations: AFP, alpha-fetoprotein; ALT, alanine aminotransferase; AST, aspartate aminotransferase; CI, confidence interval; HCC, hepatocellular carcinoma; HR, hazard ratio; PT INR, prothrombin time international normalized ratio.

Table 3. Comparison between our ML-based risk prediction model of HCC and other models.
Model	AUC	95% confidence interval		p value
Model	AUC	Lower	Upper	p value
Training cohort
ML model	0.900	0.867	0.934
mPAGE-B	0.755	0.713	0.799	<.001
PAGE B	0.721	0.673	0.771	<.001
REAL B	0.772	0.735	0.810	<.001
HCC RESCUE	0.771	0.729	0.814	<.001
CAMD	0.778	0.740	0.818	<.001
Validation cohort
ML model	0.872	0.835	0.909
mPAGE-B	0.775	0.744	0.806	<.001
PAGE B	0.765	0.736	0.794	<.001
REAL B	0.801	0.771	0.831	<.001
HCC RESCUE	0.798	0.768	0.827	<.001
CAMD	0.788	0.757	0.817	<.001