Acute kidney injury (AKI) is a frequent and heterogeneous complication among critically ill patients in the intensive care unit (ICU), often associated with adverse outcomes. This study aimed to identify phenotypic subtypes of ICU patients with AKI and to evaluate their association with clinical outcomes.
Materials and methodsA secondary analysis was conducted using the MIMIC-IV database, including a cohort of adults with varying stages of AKI, as well as patients without AKI. Factorial analysis of mixed data, followed by hierarchical clustering, was used to identify patient phenotypes based on a wide range of clinical, demographic, laboratory, and treatment variables. Cluster profiling was conducted using a multivariable logistic regression model.
ResultsAmong 1372 patients evenly distributed across stages 0 (non-AKI) to 3 (n=343 per stage), two distinct clusters were identified. Cluster 2 (n=671) had significantly higher in-hospital mortality (54.7% vs. 21.9%, p<0.001), and a greater prevalence of higher AKI stages (p<0.001). Moreover, cluster 2 showed a significantly greater frequency of sepsis, vasopressors and diuretics administration, chronic kidney disease, heart failure, and also higher respiratory and heart rate, and phosphorus. Patients in cluster 2 were a little younger and had a lower arterial O2 pressure and blood pH. A logistic regression profiling model achieved an accuracy (95% CI) of 91.4% (89.8%, 92.8%) in predicting cluster assignment.
ConclusionsThere are two clinically distinct phenotypes in patients admitted to the ICU concerning AKI with strong prognostic implications. The findings highlight the potential of routine ICU data to enable phenotype-based risk stratification in AKI.
Acute kidney injury (AKI) is a critical condition characterized by a sudden decline in renal function, frequently observed in the intensive care unit (ICU). It is associated with significant morbidity and mortality, with estimated incidence rates ranging from 15% to 40% in critically ill populations.1 Sepsis, cardiovascular instability, nephrotoxic drugs, and mechanical ventilation are major risk factors contributing to AKI development.2 AKI is associated with adverse outcomes, including prolonged ICU stays, increased need for renal replacement therapy, and high mortality rates.3 Predicting AKI onset and progression remains a major clinical challenge, as current risk stratification models do not fully capture the complex nature of the illness.4 Early detection of AKI is also essential to improve patient outcomes through targeted therapies.5 Machine learning-based predictive models offer a promising solution by identifying high-risk patients, allowing for personalized risk stratification and clinical decision-making.6
Clustering techniques provide a powerful approach to identifying subgroups of patients with similar clinical characteristics.7 These methods have been used to stratify patients in ICU settings, improving diagnosis, resource allocation, and treatment strategies.8 In the context of AKI, clustering algorithms can uncover distinct risk profiles among ICU patients, facilitating early detection and individualized intervention.9 Current models for AKI severity in the ICU remain limited in their ability to comprehensively stratify patient risk due to relying on a limited number of markers, such as serum creatinine and urine output.10,11 There is a growing need for models that integrate diverse patient data to improve early risk assessment.12 Heterogeneous, high-dimensional ICU data, which often include missing values and noise, complicate the identification of meaningful patient subgroups.13,14 Meanwhile, advanced machine learning approaches have shown promise in predicting AKI but require further refinement to enhance transparency and integration into clinical workflows.15 The development of robust interpretable clustering models could enable more precise risk stratification of AKI.16,17
The aim of conducting this study was to develop a clustering model that identifies distinct subgroups of ICU patients based on their risk of developing AKI. By incorporating comprehensive patient data, including demographics, vital signs, laboratory results, and comorbidities, we sought to identify phenotypic subtypes associated with AKI. The integration of clustering models into clinical workflows has the potential to improve risk stratification, optimize ICU resource allocation, and facilitate early detection and personalized treatment strategies.17,18 We hypothesized that comprehensive patient data would help to identify prognostic clusters of patients concerning AKI.
MethodsPopulationThis study is a secondary analysis of cross-sectional data derived from a large, intensive care cohort. We used the Medical Information Mart for Intensive Care IV (MIMIC-IV, version 3.1) database, a publicly accessible critical care dataset developed by the Massachusetts Institute of Technology Laboratory for Computational Physiology in collaboration with the Beth Israel Deaconess Medical Center in Boston, Massachusetts (https://mimic.mit.edu).19 MIMIC-IV provides de-identified, high-resolution clinical data for patients admitted to intensive care units between 2008 and 2022, and reflects real-world ICU practice within a tertiary care setting. The database includes comprehensive records, comprising patient demographics, vital signs, laboratory measurements, comorbidities, administered therapies, and outcome data. For this analysis, we selected a cohort of adult patients (≥18 years) who experienced their first ICU admission, with a minimum ICU stay of 48h, between 2017 and 2022. This temporal window was selected to ensure consistency with updated diagnostic and therapeutic practices.
Candidate variablesThe analytical dataset included baseline demographic variables, pre-existing comorbid conditions, and organ system dysfunction through a broad spectrum of laboratory measurements and physiologic markers representative of cardiovascular, respiratory, renal, hepatic, and hematologic function. Additionally, data on medication exposure – specifically the administration of vasoactive agents and diuretics – were incorporated to reflect treatment intentions and hemodynamic support. The predictor variables were extracted from the first 24h following ICU admission to represent the baseline clinical values. Due to missing and inconsistent urine output documentation across patient records, the urine output criteria for AKI classification as defined by guidelines were not applied.20 Consequently, AKI stage was determined based on serum creatinine (Cr) measurements.20 Stage 0 (no AKI) included those with no Cr increase meeting AKI criteria and no AKI-related international classification of diseases (ICD) diagnostic codes. This dual criterion was applied to minimize misclassification and ensure a true non-AKI reference group. Patients with an AKI ICD code but no creatinine rise were excluded from Stage 0. Stage 1 was defined by an increase in Cr to 1.5–1.9 times the baseline or an absolute rise of ≥0.3mg/dL within 48h. Stage 2 involved an increase in Cr to 2.0–2.9 times the baseline, while Stage 3 was characterized by an increase in Cr to ≥3.0 times baseline or an absolute Cr of ≥4.0mg/dL. Baseline serum creatinine was defined as the first available measurement at ICU admission. Pre-existing chronic kidney disease was identified based on ICD diagnostic codes. To enhance cohort validity and decrease bias introduced by incomplete data, only patients with less than 25% missing data across study variables were included. No additional patients were excluded from analysis. This strategy ensured data robustness while preserving clinical heterogeneity. The final analytical sample included patients across all four AKI severity strata (Stages 0–3). To ensure balanced representation and facilitate unbiased phenotypic discovery across the full spectrum of AKI severity (total n=2281), we used an equal sampling strategy by selecting an identical number of patients from each AKI stage subgroup (AKI stages 0–3) for clustering analysis. Given that the smallest subgroup, AKI Stage 2, comprised 343 patients, we randomly selected 343 patients from each AKI stage, resulting in a total analytical cohort of 1372 patients.
Data availabilityThe raw, de-identified data used in this study are publicly accessible through the Medical Information Mart for Intensive Care IV (MIMIC-IV) database, available at https://mimic.mit.edu.19 MIMIC-IV is an open-access, relational database constructed from the electronic health records of patients admitted to the Beth Israel Deaconess Medical Center in Boston, Massachusetts. Access to the MIMIC-IV database requires completion of appropriate training and approval under a data use agreement to ensure responsible and ethical utilization of sensitive health information.
Ethical considerationsThis study was conducted using data from the publicly available MIMIC-IV database, which is fully anonymized and did not involve direct interaction with human subjects, the collection of personal information, or any procedures requiring informed consent. The analysis was performed following ethical standards governing the use of publicly available clinical data. The use of the MIMIC-IV database requires completion of ethics training, ensuring responsible handling of sensitive health information. No raw data were republished or redistributed in this study, and all reported findings are based solely on aggregated analyses of the data. Our study followed the ethical guidelines specified in the Declaration of Helsinki and received ethics approval from the institutional review board of our university with the reference number of IR.SBMU.RETECH.REC.1403.896. In order to acquire access to data, we successfully completed the CITI “Data or Specimens Only Research” course (record ID: 53106331) and signed PhysioNet credentialed health data use agreement 1.5.0 on March 17, 2025.
Data analysesThe initial phase of analysis involved rigorous data preprocessing to ensure completeness and reliability. Variables showing large missingness (>20%), high cardinality, or severe binary imbalance (<10% prevalence) were excluded. Redundant features with strong collinearity (Spearman ρ>0.5) were filtered to decrease the risk of multicollinearity. This moderate threshold was chosen to prevent collinearity-driven distortions in distance-based clustering and to promote model parsimony suitable for clinical translation. Missing data were imputed using predictive mean matching to preserve distributional properties. Factorial analysis of mixed data was used to project high-dimensional clinical data into lower-dimensional principal components. Hierarchical clustering on principal components was applied to the transformed data to identify latent patient subgroups. The number of clusters was suggested by the dendrogram and supported further by average silhouette width, which quantifies intra-cluster cohesion and inter-cluster separation. The clustering validity was further supported by the cophenetic correlation coefficient, which measured the efficacy of the dendrogram. Variable contributions and squared cosine (Cos2) metrics were used to identify the most informative features. Cluster differentiation was statistically validated through chi-squared tests comparing AKI stage and mortality distributions. Survival characteristics across clusters were analyzed using Kaplan–Meier survival curves with log-rank testing. To further characterize cluster-defining features, a logistic regression model was trained using high-importance variables identified from the clustering phase. Hyperparameter tuning for model regularization (alpha and lambda) was performed via 10-fold cross-validation. Classification performance with 10-fold cross-validation was evaluated using the area under the ROC curve (AUC), sensitivity, specificity, accuracy, kappa statistic, and positive/negative predictive values. A confusion matrix was used to assess classification agreement.
ResultsSampleThe sample included 1372 patients with AKI severity evenly distributed across stages 0–3 (n=343 per stage). The median follow-up time was 5.5 days with an interquartile range of 3.1–11.9 days. There were no duplicate rows in the study dataset. Fig. 1 illustrates the initial variables and the patterns of missing data. In total, 11.4% of the initial data were missing. We then excluded variables with >20% missing data from further analysis: cardiac output, C-reactive protein, direct bilirubin, central venous pressure, arterial O2 saturation, serum albumin, and marital status. Furthermore, the variable race was excluded due to its high cardinality (30 levels). Binary features were also evaluated for imbalance, and those with fewer than 10% positive cases were excluded from the analysis. Specifically, among comorbidities, liver disease (3.7% positive), type 1 diabetes mellitus (2.2%), cerebrovascular disease (1.2%), and acute myocardial infarction (0.4%) were excluded. Similarly, medication use variables of dobutamine (4.6%), milrinone (3.1%), dopamine (1.5%), and bumetanide (1.3%) were also excluded due to low prevalence. Fig. 2 illustrates the heatmap of correlations among the study variables using Spearman's correlation coefficients. There were large and statistically significant correlation between ALT and AST (ρ=0.86), serum chloride and sodium (ρ=0.63), norepinephrine and vasopressin (ρ=0.61), systolic and diastolic blood pressure (ρ=0.57), prothrombin time and total bilirubin (ρ=0.55), arterial CO2 pressure and PH (ρ=−0.51), all p<0.01. To reduce redundancy and decrease the risk of multicollinearity, we excluded one variable from each pair of the correlated features. Specifically, we excluded AST, serum chloride, vasopressin, diastolic blood pressure, total bilirubin, and arterial CO2 pressure. These selections were based on clinical relevance, physiological specificity, and the goal of retaining the variable that provides broader or more direct insight into patient status. The remaining data contained only 3.8% missing values, which were imputed using predictive mean matching.
ClusteringFig. 3A presents a scatter plot of the first and second principal components, with deceased patients indicated in black. A greater concentration of deceased patients is observed on the right side of the plot, suggesting an association between the first component and mortality. Similarly, Fig. 3B displays the same scatter plot colored according to AKI stages (0–3). Patients with lower AKI stages are predominantly located on the left side of the plot. Stages 1 and 2 show substantial overlap, particularly along the first principal component. Hierarchical clustering on principal components best suggested the presence of two clusters (Fig. 3C). The average silhouette width was 0.56, indicating moderately well-separated clusters. The cophenetic correlation coefficient was 0.59, suggesting that the hierarchical clustering dendrogram reasonably preserved the original distance structure. Fig. 3D presents a scatter plot illustrating the results of the hierarchical clustering. Fig. 3E and F illustrate the variable contributions and the squared cosine (Cos2) values of the variables, respectively, indicating that the clusters can be effectively distinguished using a subset of patient features. The clear separation between the two cluster centroids, together with the distribution of clinical severity along the first principal component, indicates that the clustering scheme may hold prognostic significance. Cluster 1 included 701 patients, and cluster 2 included 671 patients. Cluster 1 included 314 (44.8%), 159 (22.7%), 148 (21.1%), and 80 (11.4%) patients with AKI stages 0, 1, 2, and 3, respectively, while cluster 2 included 29 (4.3%), 184 (27.4%), 195 (29.1%), and 263 (39.2%) patients across the same stages. There was a significant difference in the distribution of AKI severity between the clusters, χ2(3)=342.210, p<0.001. There was also a statistically significant difference in mortality rates between clusters 1 (145 deaths; 21.9%) and 2 (367 deaths; 54.7%), χ2(1)=154.500, p<0.001, odds ratio (95% CI)=4.28 (3.39, 5.42). Overall, cluster 2 was a high-risk subgroup of patients with greater AKI severity and higher mortality, signifying a poorer prognosis compared with cluster 1. Fig. 4A demonstrates a significant difference in 30-day survival probability between the two clusters, as confirmed by the log-rank test.
(A) Scatter plot of the first two principal components, with deceased patients marked in black. A higher frequency of deceased patients is visible on the right side of the plot, indicating an association between the first component and mortality. (B) Scatter plot of the first two principal components colored by AKI stages (0–3). Patients with lower AKI stages are represented mainly on the left, with stages 1 and 2 overlapping along the first principal component. (C) Dendrogram from hierarchical clustering performed on the principal components, suggesting two distinct patient clusters. (D) Scatter plot showing patient clusters derived from hierarchical clustering on principal components. (E) Variable contributions to the clustering, highlighting patient features that distinguish the two clusters above the average contribution (indicated by the dashed red line). (F) Squared cosine (Cos2) values for the variables, indicating the quality of variable representation.
(A) Kaplan–Meier survival curves showing 30-day survival probability for the two clusters. A significant difference between clusters was observed by the log-rank test, with cluster 2 showing higher mortality. (B) Receiver operating characteristic (ROC) curve illustrating the performance of the logistic model in discriminating between the two clusters. The area under the curve (AUC) demonstrates strong model accuracy and robustness.
We developed a logistic regression model to evaluate the discriminative ability of the two clusters using the features identified as important in the cluster analysis. Hyperparameter tuning via 10-fold cross-validation identified the optimal logistic regression model parameters with an alpha of 0.100 and a lambda of 0.044. Table 1 presents the logistic regression model showing variables that differentiate the two clusters and identify a distinct high-risk phenotype. Variables related to critical illness severity – such as sepsis, vasopressor use, chronic comorbidities (e.g., chronic kidney disease and heart failure), and markers of organ dysfunction – were strongly associated with membership in the high-risk cluster. The regression coefficients and odds ratios indicate that these features are powerful predictors of cluster assignment. Additionally, elevated respiratory and heart rates were statistically significant, further supporting the physiological distinctiveness of this group. In contrast, higher arterial oxygen levels, normal pH, and older age were negatively associated with high-risk cluster membership. These statistical patterns may help clinicians identify ICU patients with phenotypic features compatible with more severe AKI trajectories. Model performance was evaluated using 10-fold cross-validation. Fig. 4B presents the model's performance, as measured by the area under the ROC curve (AUC). The large AUC demonstrated the model's strong performance in distinguishing the two clusters, thereby validating the effectiveness of the clustering scheme. Table 2 summarizes the diagnostic performance of the model in identifying the positive class (class 2) among the clustered patients. The logistic model demonstrated strong and balanced classification performance, effectively distinguishing between the two clusters with high concordance between predicted and actual labels. The diagnostic accuracy indicates minimal systematic bias in misclassification. Overall, these results support the model's reliability and robustness in identifying the two patient profiles.
The logistic model for cluster profiling.
| Characteristic | Total | Cluster 1 | Cluster 2 | Coeff. (SD) | AOR (95% CI) | p |
|---|---|---|---|---|---|---|
| Sepsis (%) | 523 (38.1) | 87 (12.4) | 436 (65.0) | 3.663 (0.305) | 38.98 (21.42, 70.93) | <0.001* |
| Norepinephrine (%) | 596 (43.4) | 147 (21.0) | 449 (66.9) | 2.815 (0.317) | 16.7 (8.97, 31.06) | <0.001* |
| Chronic kidney disease (%) | 336 (24.5) | 98 (14.0) | 238 (35.5) | 2.694 (0.324) | 14.79 (7.84, 27.9) | <0.001* |
| Heart failure (%) | 443 (32.3) | 156 (22.3) | 287 (42.8) | 2.573 (0.31) | 13.11 (7.14, 24.07) | <0.001* |
| Respiratory failure (%) | 731 (53.3) | 246 (35.1) | 485 (72.3) | 1.874 (0.255) | 6.52 (3.95, 10.74) | <0.001* |
| Epinephrine (%) | 177 (12.9) | 55 (7.8) | 122 (18.2) | 1.53 (0.445) | 4.62 (1.93, 11.05) | 0.001* |
| Phosphorus (mg/dL) | 4.4 (2.0) | 3.7 (1.4) | 5.2 (2.2) | 0.691 (0.077) | 2.00 (1.72, 2.32) | <0.001* |
| Furosemide (%) | 642 (46.8) | 275 (39.2) | 367 (54.7) | 0.638 (0.286) | 1.89 (1.08, 3.31) | 0.026* |
| Phenylephrine (%) | 482 (35.1) | 221 (31.5) | 261 (38.9) | 0.192 (0.287) | 1.21 (0.69, 2.13) | 0.505 |
| Respiratory rate (insp/min) | 19.9 (6.9) | 17.7 (5.8) | 22.2 (7.1) | 0.119 (0.02) | 1.13 (1.08, 1.17) | <0.001* |
| Heart rate (bpm) | 91.7 (21.0) | 84.6 (17.2) | 99.2 (22.0) | 0.069 (0.008) | 1.07 (1.06, 1.09) | <0.001* |
| Magnesium (mg/dL) | 2.2 (0.7) | 2.2 (0.6) | 2.2 (0.7) | 0.068 (0.184) | 1.07 (0.75, 1.54) | 0.712 |
| Arterial O2 pressure (mmHg) | 179.3 (112.6) | 220.2 (120.5) | 136.6 (84.9) | −0.01 (0.001) | 0.99 (0.99, 0.99) | <0.001* |
| Age (year) | 60.1 (14.0) | 60.3 (13.6) | 60.0 (14.3) | −0.022 (0.009) | 0.98 (0.96, 1.00) | 0.013* |
| pH (units) | 7.3 (0.1) | 7.4 (0.1) | 7.3 (0.1) | −12.946 (1.516) | 0.00 (0.00, 0.00) | <0.001* |
SD, standard deviation; AOR, adjusted odds ratio.
Confusion matrix and performance metrics for the study logistic model (positive class=class 2).
| Reference | ||
|---|---|---|
| Cluster 1 | Cluster 2 | |
| Confusion matrix | ||
| Prediction | ||
| Cluster 1 | 642 | 59 |
| Cluster 2 | 59 | 612 |
| Performance metrics | |
| Accuracy (95% CI) | 0.914 (0.898, 0.928) |
| No information rate | 0.511 |
| [Accuracy >no information rate] p | <0.001* |
| Kappa | 0.8279 |
| Sensitivity | 0.912 |
| Specificity | 0.916 |
| Positive predictive value | 0.912 |
| Negative predictive value | 0.916 |
| Area under the curve | 0.976 |
This study aimed to identify phenotypes of ICU patients concerning AKI through a data-driven clustering framework, hypothesizing that multidimensional patient profiles could identify prognostically meaningful subgroups beyond conventional AKI staging. The findings confirmed the presence of distinct clinical clusters associated with varying degrees of physiologic derangement and outcome risk. By incorporating a wide range of clinical, laboratory, and therapeutic variables, the clustering model captured heterogeneity that is not reflected in serum creatinine-based staging systems. The unsupervised learning strategy revealed a high-risk cluster characterized by systemic inflammation, cardiovascular compromise, and multiorgan dysfunction. The logistic profiling model demonstrated that features such as sepsis, chronic kidney disease, heart failure, respiratory failure, and vasopressor use – particularly norepinephrine and epinephrine – were strongly associated with this high-risk cluster. Additionally, elevated serum phosphorus, increased respiratory and heart rates, and loop diuretic administration further contributed to the cluster differentiation, suggesting a convergence of hemodynamic instability and therapeutic intensity. In contrast, parameters such as higher arterial oxygenation and a higher blood pH appeared inversely associated with the high-risk phenotype, reinforcing the multidimensional nature of critical illness severity. This methodology enabled precise identification of vulnerable ICU populations and provided a foundation for future phenotype-targeted strategies in critical care settings.
Recent studies have supported the application of unsupervised machine learning in identifying phenotypically distinct subtypes among ICU patients with AKI. A study on dialysis-requiring AKI showed that phenotypes characterized by systemic inflammation, hemodynamic instability, and organ dysfunction correlate with significantly worse outcomes.21 Deep learning-based models have also suggested clinically meaningful AKI subgroups with differential mortality risks, highlighting the limitations of serum creatinine-based staging alone.22 Tan et al. found AKI phenotypes with variable trajectories and clinical patterns, which could not be captured using KDIGO criteria alone.23 Meanwhile, compared with the seven subgroups identified by Tan et al., our identification of a smaller number of clinically distinct clusters enhances practical applicability by facilitating straightforward risk stratification and bedside implementation. Smith et al. suggested that AKI is a multifactorial syndrome best understood via longitudinal and multivariate modeling rather than conventional creatinine thresholds.24 Thongprayoon et al. also used clustering in an AKI population and found distinct clusters with divergent comorbidities and laboratory profiles associated with outcomes, supporting our multimodal data integration.25 However, both Tan et al. and Thongprayoon et al. did not include patients without AKI in their sample. Our use of routinely collected ICU data supports prior work suggesting that early AKI risk can be predicted using accessible variables, enabling real-time clinical application.26 Our cluster-based prognostic differentiation is consistent with studies emphasizing the importance of AKI subtyping for guiding personalized treatment approaches.27 Overall, the findings of this study are congruent with – and further reinforce – the growing body of evidence supporting phenotype-driven stratification in critical care practice.
The distinct clinical variables that separated the high- and low-risk AKI phenotypes in our study reflect pathophysiologic mechanisms driving organ dysfunction in critical illness. Sepsis, a dominant feature in the high-risk cluster, contributes to AKI through a combination of systemic inflammation, microvascular dysregulation, and mitochondrial dysfunction, leading to renal tubular cell stress and apoptosis.28 The use of vasopressors, particularly norepinephrine, while critical for maintaining perfusion, reflects the severity of shock and often exacerbates renal hypoperfusion and endothelial injury via excessive vasoconstriction and impaired autoregulation.29 Chronic kidney disease, more prevalent in our high-risk group, predisposes patients to AKI through maladaptive repair, capillary rarefaction, and pre-existing nephron loss, creating a vulnerability to sepsis and nephrotoxins.30 Heart failure compounds renal injury through elevated venous pressures, reduced cardiac output, and neurohormonal activation, which reduce glomerular filtration via altered transcapillary pressures.31 Respiratory failure and mechanical ventilation contribute to AKI through hypoxia, inflammation, and elevated intrathoracic pressures that impair renal perfusion and venous return.32 Low blood pH further disrupts cellular metabolism, enhances inflammation, and impairs myocardial contractility, exacerbating multiorgan dysfunction and correlating with higher mortality risk.33 Also, hyperphosphatemia, observed in the high-risk cluster, reflects cellular breakdown, impaired renal clearance, and systemic catabolism, all of which are markers of severe illness and poor prognosis.34 Age appeared inversely associated with high-risk cluster membership, suggesting that younger patients in our cohort were more frequently represented among those with severe acute physiologic disturbances. This pattern may reflect the predominance of high-intensity conditions such as sepsis, multiorgan failure, and vasopressor-dependent shock in younger individuals, who might develop more pronounced inflammatory and hemodynamic responses. Conversely, older patients, while burdened with chronic comorbidities, may experience less abrupt physiologic deterioration during ICU admission, aligning with the lower-risk phenotype.35 Together with higher arterial oxygenation and normal pH, these features may represent relative protective factors within the multidimensional context of AKI severity. Overall, these variables represent interrelated pathophysiological pathways connecting cardiovascular compromise, metabolic derangement, and inflammation, supporting the biological plausibility and clinical applicability of our phenotypic classification.
Clinical implicationsThis study presented evidence that AKI phenotypic profiling provides enhanced risk stratification relative to conventional serum creatinine-based staging by integrating a broader spectrum of patients’ clinical characteristics. We suggested that high-risk patients can be identified using routinely available clinical data, enabling timely risk recognition. The strong discriminative performance of the logistic model supports the feasibility of building real-time clinical decision support tools to stratify ICU patients into phenotypic clusters at admission. These phenotypic profiles could guide tailored interventions; for example, patients in the high-risk cluster may benefit from closer monitoring, early nephrology consultation, and aggressive hemodynamic and metabolic optimization. This two-cluster approach may improve prognostic accuracy, facilitate communication with families, and support shared decision-making, compared with the conventional four-stage severity classification. Phenotype-guided research and trials enable the design of more targeted therapies and personalized treatment strategies in AKI management.
LimitationsThis study had several limitations that warrant consideration. The analysis was based on retrospective data, which may introduce biases inherent to observational studies, such as unmeasured confounding factors. Although the clustering approach was unsupervised, selection bias related to missing data and variable exclusion criteria cannot be fully eliminated. Nevertheless, the data preprocessing ensured statistical integrity by removing highly imbalanced or collinear variables and applying robust imputation techniques. Race can affect the generalizability of research findings to other populations. However, we excluded it due to its high cardinality, sparse distribution, and inconsistent documentation, which could distort factor projections and bias cluster centroids. Our analysis focused primarily on physiological and biochemical variables rather than sociodemographic determinants. Comorbidity variables with fewer than 10% positive cases were also excluded to prevent statistical sparsity and instability in the factorial analysis. Inclusion of rare features can distort covariance structures and compromise cluster reproducibility. The exclusion preserved statistical power, ensured meaningful variable contributions to multidimensional variance, and enhanced clinical interpretability of the resulting phenotypes. We used advanced unsupervised learning for phenotype discovery, provided equal representation of AKI severity stages, and internally validated both clustering and classification performance. While these features support the robustness and translational potential of the findings, external validation with an independent dataset is still needed to confirm the generalizability and reproducibility of the model across diverse populations and clinical settings.
ConclusionThis study identified two clinically and prognostically distinct phenotypic subgroups among ICU patients concerning AKI using a data-driven clustering approach applied to high-dimensional clinical data from the MIMIC-IV database. Unlike conventional AKI staging systems that rely on serum creatinine or urine output, our model integrated a wide range of clinical, laboratory, and treatment variables to identify latent patient profiles with significantly different outcomes. The high-risk cluster was characterized by a higher prevalence of sepsis, chronic kidney disease, heart failure, vasopressor use, respiratory failure, and derangements in vital signs and metabolic parameters, all of which were strongly predictive of mortality and severe AKI progression. Our logistic profiling model showed high performance in distinguishing between clusters, confirming the clinical validity of the phenotypic stratification. The ability to detect these phenotypes using routinely collected ICU data suggests that real-time clinical implementation is feasible, offering opportunities to enhance early identification, personalize management strategies, and guide more informed communication with patients and families. These findings support a shift toward phenotype-based risk stratification and personalized intervention in critical care and highlight the translational potential of machine learning tools in improving AKI outcomes.
ORCID IDMohammad Fathi: 0000-0002-9214-724X
Nader Markazi Moghaddam: 0000-0003-2861-2765
Hamed Markazi Moghadam: 0000-0002-5416-1534
CRediT authorship contribution statementConceptualization: MoF, NMM; Formal analysis: NMM, HMM, MaF; Investigation: MH, NN, NMA; Methodology: SZBJ, NMA; Project administration: NMM, SZBJ; Resources: HMM; Software: NMM, HMM; Supervision: MoF; Validation: MH, NN; Visualization: NMM, HMM, MaF; Writing – original draft preparation: MaF, MH, NN, NMA; Writing – review & editing: MoF, HMM, NMM, SZBJ.
Ethics approvalEthics approval was obtained from the institutional review board of Shahid Beheshti University of Medical Sciences with the reference number of IR.SBMU.RETECH.REC.1403.896.
FundingThis research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Declaration of competing interestsThe authors declare that they have no competing interests.











