Universidad de Burgos RIUBU Principal Default Universidad de Burgos RIUBU Principal Default
  • español
  • English
  • français
  • Deutsch
  • português (Brasil)
  • italiano
Universidad de Burgos RIUBU Principal Default
  • Ayuda
  • Kontakt
  • Feedback abschicken
  • Acceso abierto
    • Archivar en RIUBU
    • Acuerdos editoriales para la publicación en acceso abierto
    • Controla tus derechos, facilita el acceso abierto
    • Sobre el acceso abierto y la UBU
    • español
    • English
    • français
    • Deutsch
    • português (Brasil)
    • italiano
    • español
    • English
    • français
    • Deutsch
    • português (Brasil)
    • italiano
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Stöbern

    Gesamter BestandBereiche & SammlungenErscheinungsdatumAutorenTitelnSchlagwortenDiese SammlungErscheinungsdatumAutorenTitelnSchlagworten

    Mein Benutzerkonto

    EinloggenRegistrieren

    Statistiken

    Benutzungsstatistik

    Compartir

    Dokumentanzeige 
    •   RIUBU Startseite
    • E-Prints
    • Untitled
    • Untitled
    • Artículos ADMIRABLE
    • Dokumentanzeige
    •   RIUBU Startseite
    • E-Prints
    • Untitled
    • Untitled
    • Artículos ADMIRABLE
    • Dokumentanzeige

    Por favor, use este identificador para citar o enlazar este ítem: https://hdl.handle.net/10259/11430

    Título
    Semi-supervised prediction of protein fitness for data-driven protein engineering
    Autor
    Olivares Gil, AliciaAutoridad UBU Orcid
    Barbero Aparicio, José AntonioAutoridad UBU Orcid
    Rodríguez Diez, Juan JoséAutoridad UBU Orcid
    Diez Pastor, José FranciscoAutoridad UBU Orcid
    García Osorio, CésarAutoridad UBU Orcid
    Davari, Mehdi D.
    Publicado en
    Journal of Cheminformatics. 2025, V. 17, n. 1, 88
    Editorial
    BioMed Central
    Fecha de publicación
    2025-12
    ISSN
    1758-2946
    DOI
    10.1186/s13321-025-01029-w
    Zusammenfassung
    Protein fitness prediction plays a crucial role in the advancement of protein engineering endeavours. However, the combinatorial complexity of the protein sequence space and the limited availability of assay-labelled data hinder the efficient optimization of protein properties. Data-driven strategies utilizing machine learning methods have emerged as a promising solution, yet their dependence on labelled training datasets poses a significant obstacle. To overcome this challenge, in this work, we explore various ways of introducing the latent information present in evolutionarily related sequences (homologous sequences) into the training process. To do so, we establish several strategies based on semi-supervised learning (unsupervised pre-processing and wrapper methods) and perform a comprehensive comparison using 19 datasets containing protein-fitness pairs. Our findings reveal that using the information present in the homologous sequences can improve the performance of the models, especially when the number of available labelled sequences is considerably low. Specifically, the combination of a sequence encoding method based on Direct Coupling Analysis (DCA), with MERGE (a hybrid regression framework that combines evolutionary information with supervised learning) and an SVM regressor, outperforms other encodings (PAM250, UniRep, eUniRep) and other semi-supervised wrapper methods (Tri-Training Regressor, Co-Training Regressor). In summary, the demonstrated performance gains of this strategy mark a substantial leap towards more robust and reliable predictive models for protein engineering tasks. This advancement holds the potential to streamline the design and optimisation of proteins for diverse applications in biotechnology and therapeutics.
    Palabras clave
    Machine learning
    Protein engineering
    Directed evolution
    Semi-supervised learning
    Protein design
    Tritraining regressor
    Generalized MERGE
    Materia
    Proteínas
    Proteins
    Bioinformática
    Bioinformatics
    URI
    https://hdl.handle.net/10259/11430
    Versión del editor
    https://doi.org/10.1186/s13321-025-01029-w
    Aparece en las colecciones
    • Untitled
    • Artículos ADMIRABLE
    Attribution-NonCommercial-NoDerivatives 4.0 Internacional
    Documento(s) sujeto(s) a una licencia Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 Internacional
    Dateien zu dieser Ressource
    Nombre:
    Olivares-jc_2025.pdf
    Tamaño:
    2.086Mb
    Formato:
    Adobe PDF
    Thumbnail
    Öffnen

    Métricas

    Citas

    Ver estadísticas de uso

    Exportar

    RISMendeleyRefworksZotero
    • edm
    • marc
    • xoai
    • qdc
    • ore
    • ese
    • dim
    • uketd_dc
    • oai_dc
    • etdms
    • rdf
    • mods
    • mets
    • didl
    • premis
    Zur Langanzeige

    Universidad de Burgos

    Powered by MIT's. DSpace software, Version 5.10