Multicriteria Optimization of Language Models for Heart Failure With Preserved Ejection Fraction Symptom Detection in Spanish Electronic Health Records: Comparative Modeling Study

dc.contributor.authorMata Vázquez, Jacinto
dc.contributor.authorPachón Álvarez, Victoria
dc.contributor.authorManovel, Ana
dc.contributor.authorMaña López, Manuel Jesús
dc.contributor.authorVilla Cordero, Manuel de la
dc.date.accessioned2026-01-21T10:06:34Z
dc.date.available2026-01-21T10:06:34Z
dc.date.issued2025
dc.description.abstractBackground: Heart failure with preserved ejection fraction (HFpEF) is a major clinical manifestation of cardiac amyloidosis, a condition frequently underdiagnosed due to its nonspecific symptomatology. Electronic health records (EHRs) offer a promising avenue for supporting early symptom detection through natural language processing. However, identifying relevant clinical cues within unstructured narratives, particularly in Spanish, remains a significant challenge due to the scarcity of annotated corpora and domain-specific models. This study proposes and evaluates a Transformer-based natural language processing framework for automated detection of HFpEF-related symptoms in Spanish EHRs. Objective: The aim of this study is to assess the feasibility of leveraging unstructured clinical narratives to support early identification of heart failure phenotypes indicative of cardiac amyloidosis. It also examines how domain-specific language models and clinically guided optimization strategies can improve the reliability, sensitivity, and generalizability of symptom detection in real-world EHRs. Methods: A novel corpus of 15,304 Spanish clinical documents was manually annotated and validated by cardiology experts. The corpus was derived from the records of 262 patients (173 with suspected cardiac amyloidosis and 89 without). In total, 8 Transformer-based language models were evaluated, including general-purpose models, biomedical-specialized variants, and Longformers. Three clinically motivated optimization strategies were implemented to align models’ behavior with different diagnostic priorities: maximizing area under the curve (AUC) to enhance overall discrimination, optimizing F1-score to balance sensitivity and precision, and prioritizing sensitivity to minimize false negatives. These strategies were independently applied during the fine-tuning of the models to assess their impact on performance under different clinical constraints. To ensure robust evaluation, testing was conducted on a dataset composed exclusively of previously unseen patients, allowing performance to be assessed under realistic and generalizable conditions. Results: All models achieved high performance, with AUC values above 0.940. The best-performing model, Longformer Biomedical-clinical, reached an AUC of 0.987, F1-score of 0.985, sensitivity of 0.987, and specificity of 0.987 on the test dataset. Models optimized for sensitivity reduced the false-negative rate to under 3%, a key threshold for clinical safety. Comparative analyses confirmed that domain-adapted, long-sequence models are better suited for the semantic and structural complexity of Spanish clinical texts than general-purpose models. Conclusions: Transformer-based models can reliably detect HFpEF-related symptoms from Spanish EHRs, even in the presence of class imbalance and substantial linguistic complexity. The results show that combining domain-specific pretraining with long-context modeling architectures and clinically aligned optimization strategies leads to substantial gains in classification performance, particularly in sensitivity. These models not only achieve high accuracy and generalization on unseen patients but also demonstrate robustness in handling the semantic nuances and narrative structure of real-world clinical documentation. These findings support the potential deployment of Transformer-based systems as effective screening tools to prioritize patients at risk for cardiac amyloidosis in Spanish-speaking health care settings.
dc.description.departmentTecnologías de la Información
dc.description.sponsorshipInstitute of Health Carlos III, Ministry of Science, Innovation and Universities, Spanish Government (grant number PI20/01485)
dc.identifier.citationMata J, Pachón V, Manovel A, Maña M, de la Villa M Multicriteria Optimization of Language Models for Heart Failure With Preserved Ejection Fraction Symptom Detection in Spanish Electronic Health Records: Comparative Modeling Study J Med Internet Res 2025;27:e76433 URL: https://www.jmir.org/2025/1/e76433 DOI: 10.2196/76433
dc.identifier.doi10.2196/76433
dc.identifier.issn1438-8871
dc.identifier.urihttps://hdl.handle.net/10272/27737
dc.language.isoeng
dc.publisherJournal of Medical Internet Research
dc.relation.projectIDinfo:eu-repo/grantAgreement/ISCIII/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020 (ISCIII)/PI20%2F01485/ES/APLICACION DE TECNICAS DE PROCESAMIENTO DEL LENGUAJE NATURAL Y APRENDIZAJE AUTOMATICO A LA HISTORIA CLINICA DIGITALIZADA PARA SCREENING CLINICO DE AMILOIDOSIS CARDIACA./
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internationalen
dc.rights.accessRightsopen access
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subjectNatural Language Processing
dc.subjectTransformer
dc.subjectClinical Language Models
dc.subjectManual Corpus Annotation
dc.subjectSymptom Extraction
dc.subjectEarly Diagnosis Support
dc.subject.unesco1203.17 Informática
dc.subject.unesco3205.01 Cardiología
dc.titleMulticriteria Optimization of Language Models for Heart Failure With Preserved Ejection Fraction Symptom Detection in Spanish Electronic Health Records: Comparative Modeling Study
dc.typejournal article
dc.type.hasVersionVoR
dspace.entity.typePublication
relation.isAuthorOfPublicationac76819b-d91a-4158-b947-4a9e827e5e9d
relation.isAuthorOfPublication47cb4892-3513-4d33-953c-8521bc9cb187
relation.isAuthorOfPublication8eb22794-136c-4b46-bb56-af4406eb26f3
relation.isAuthorOfPublicationc0061c50-20b4-46a2-9f33-5473c7e877f1
relation.isAuthorOfPublication.latestForDiscoveryac76819b-d91a-4158-b947-4a9e827e5e9d

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
jmir-2025-1-e76433-2.pdf
Size:
971.35 KB
Format:
Adobe Portable Document Format
Description:
Versión editor

Collections