Universidad de Burgos RIUBU Principal Default Universidad de Burgos RIUBU Principal Default
  • español
  • English
  • français
  • Deutsch
  • português (Brasil)
  • italiano
Universidad de Burgos RIUBU Principal Default
  • Ayuda
  • Contact Us
  • Send Feedback
  • Acceso abierto
    • Archivar en RIUBU
    • Acuerdos editoriales para la publicación en acceso abierto
    • Controla tus derechos, facilita el acceso abierto
    • Sobre el acceso abierto y la UBU
    • español
    • English
    • français
    • Deutsch
    • português (Brasil)
    • italiano
    • español
    • English
    • français
    • Deutsch
    • português (Brasil)
    • italiano
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

    All of RIUBUCommunities and CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

    My Account

    LoginRegister

    Statistics

    View Usage Statistics

    Compartir

    View Item 
    •   RIUBU Home
    • E-Prints and Research Data
    • Untitled
    • Untitled
    • Artículos ADMIRABLE
    • View Item
    •   RIUBU Home
    • E-Prints and Research Data
    • Untitled
    • Untitled
    • Artículos ADMIRABLE
    • View Item

    Por favor, use este identificador para citar o enlazar este ítem: https://hdl.handle.net/10259/11430

    Título
    Semi-supervised prediction of protein fitness for data-driven protein engineering
    Autor
    Olivares Gil, AliciaUBU authority Orcid
    Barbero Aparicio, José AntonioUBU authority Orcid
    Rodríguez Diez, Juan JoséUBU authority Orcid
    Diez Pastor, José FranciscoUBU authority Orcid
    García Osorio, CésarUBU authority Orcid
    Davari, Mehdi D.
    Publicado en
    Journal of Cheminformatics. 2025, V. 17, n. 1, 88
    Editorial
    BioMed Central
    Fecha de publicación
    2025-12
    ISSN
    1758-2946
    DOI
    10.1186/s13321-025-01029-w
    Abstract
    Protein fitness prediction plays a crucial role in the advancement of protein engineering endeavours. However, the combinatorial complexity of the protein sequence space and the limited availability of assay-labelled data hinder the efficient optimization of protein properties. Data-driven strategies utilizing machine learning methods have emerged as a promising solution, yet their dependence on labelled training datasets poses a significant obstacle. To overcome this challenge, in this work, we explore various ways of introducing the latent information present in evolutionarily related sequences (homologous sequences) into the training process. To do so, we establish several strategies based on semi-supervised learning (unsupervised pre-processing and wrapper methods) and perform a comprehensive comparison using 19 datasets containing protein-fitness pairs. Our findings reveal that using the information present in the homologous sequences can improve the performance of the models, especially when the number of available labelled sequences is considerably low. Specifically, the combination of a sequence encoding method based on Direct Coupling Analysis (DCA), with MERGE (a hybrid regression framework that combines evolutionary information with supervised learning) and an SVM regressor, outperforms other encodings (PAM250, UniRep, eUniRep) and other semi-supervised wrapper methods (Tri-Training Regressor, Co-Training Regressor). In summary, the demonstrated performance gains of this strategy mark a substantial leap towards more robust and reliable predictive models for protein engineering tasks. This advancement holds the potential to streamline the design and optimisation of proteins for diverse applications in biotechnology and therapeutics.
    Palabras clave
    Machine learning
    Protein engineering
    Directed evolution
    Semi-supervised learning
    Protein design
    Tritraining regressor
    Generalized MERGE
    Materia
    Proteínas
    Proteins
    Bioinformática
    Bioinformatics
    URI
    https://hdl.handle.net/10259/11430
    Versión del editor
    https://doi.org/10.1186/s13321-025-01029-w
    Collections
    • Untitled
    • Artículos ADMIRABLE
    Attribution-NonCommercial-NoDerivatives 4.0 Internacional
    Documento(s) sujeto(s) a una licencia Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 Internacional
    Files in this item
    Nombre:
    Olivares-jc_2025.pdf
    Tamaño:
    2.086Mb
    Formato:
    Adobe PDF
    Thumbnail
    FilesOpen

    Métricas

    Citas

    Ver estadísticas de uso

    Export

    RISMendeleyRefworksZotero
    • edm
    • marc
    • xoai
    • qdc
    • ore
    • ese
    • dim
    • uketd_dc
    • oai_dc
    • etdms
    • rdf
    • mods
    • mets
    • didl
    • premis
    Show full item record

    Universidad de Burgos

    Powered by MIT's. DSpace software, Version 5.10