Universidad de Burgos RIUBU Principal Default Universidad de Burgos RIUBU Principal Default
  • español
  • English
  • français
  • Deutsch
  • português (Brasil)
  • italiano
Universidad de Burgos RIUBU Principal Default
  • Ayuda
  • Contacto
  • Sugerencias
  • Acceso abierto
    • Archivar en RIUBU
    • Acuerdos editoriales para la publicación en acceso abierto
    • Controla tus derechos, facilita el acceso abierto
    • Sobre el acceso abierto y la UBU
    • español
    • English
    • français
    • Deutsch
    • português (Brasil)
    • italiano
    • español
    • English
    • français
    • Deutsch
    • português (Brasil)
    • italiano
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Listar

    Todo RIUBUComunidadesFechaAutor / DirectorTítuloMateria / AsignaturaEsta colecciónFechaAutor / DirectorTítuloMateria / Asignatura

    Mi cuenta

    AccederRegistro

    Estadísticas

    Ver Estadísticas de uso

    Compartir

    Ver ítem 
    •   RIUBU Principal
    • E-Prints y Datos de investigación
    • Grupos de investigación
    • Advanced Data Mining Research and Bioinformatics Learning (ADMIRABLE)
    • Artículos ADMIRABLE
    • Ver ítem
    •   RIUBU Principal
    • E-Prints y Datos de investigación
    • Grupos de investigación
    • Advanced Data Mining Research and Bioinformatics Learning (ADMIRABLE)
    • Artículos ADMIRABLE
    • Ver ítem

    Por favor, use este identificador para citar o enlazar este ítem: http://hdl.handle.net/10259/9281

    Título
    Addressing data scarcity in protein fitness landscape analysis: A study on semi-supervised and deep transfer learning techniques
    Autor
    Barbero Aparicio, José AntonioAutoridad UBU Orcid
    Olivares Gil, AliciaAutoridad UBU Orcid
    Rodríguez Diez, Juan JoséAutoridad UBU Orcid
    García Osorio, CésarAutoridad UBU Orcid
    Diez Pastor, José FranciscoAutoridad UBU Orcid
    Publicado en
    Information Fusion. 2024, V. 102, 102035
    Editorial
    Elsevier
    Fecha de publicación
    2024-02
    ISSN
    1566-2535
    DOI
    10.1016/j.inffus.2023.102035
    Resumen
    This paper presents a comprehensive analysis of deep transfer learning methods, supervised methods, and semi-supervised methods in the context of protein fitness prediction, with a focus on small datasets. The analysis includes the exploration of the combination of different data sources to enhance the performance of the models. While deep learning and deep transfer learning methods have shown remarkable performance in situations with abundant data, this study aims to address the more realistic scenario faced by wet lab researchers, where labeled data is often limited. The novelty of this work lies in its examination of deep transfer learning in the context of small datasets and its consideration of semi-supervised methods and multi-view strategies. While previous research has extensively explored deep transfer learning in large dataset scenarios, little attention has been given to its efficacy in small dataset settings or its comparison with semi-supervised approaches. Our findings suggest that deep transfer learning, exemplified by ProteinBERT, shows promising performance in this context compared to the rest of the methods across various evaluation metrics, not only in small dataset contexts but also in large dataset scenarios. This highlights the robustness and versatility of deep transfer learning in protein fitness prediction tasks, even with limited labeled data. The results of this study shed light on the potential of deep transfer learning as a state-of-the-art approach in the field of protein fitness prediction. By leveraging pre-trained models and fine-tuning them on small datasets, researchers can achieve competitive performance surpassing traditional supervised and semi-supervised methods. These findings provide valuable insights for wet lab researchers who face the challenge of limited labeled data, enabling them to make informed decisions when selecting the most effective methodology for their specific protein fitness prediction tasks. Additionally, the study investigated the combination of two different sources of information (encodings) through our enhanced semi-supervised methods, yielding noteworthy results improving their base model and providing valuable insights for further research. The presented analysis contributes to a better understanding of the capabilities and limitations of different learning approaches in small dataset scenarios, ultimately aiding in the development of improved protein fitness prediction methods.
    Palabras clave
    Bioinformatics
    Machine learning
    Transfer learning
    Semi-supervised learning
    Protein fitness prediction
    Small datasets
    Materia
    Informática
    Computer science
    Bioinformática
    Bioinformatics
    URI
    http://hdl.handle.net/10259/9281
    Versión del editor
    https://doi.org/10.1016/j.inffus.2023.102035
    Aparece en las colecciones
    • Artículos BEST-AI
    • Artículos ADMIRABLE
    Atribución-NoComercial 4.0 Internacional
    Documento(s) sujeto(s) a una licencia Creative Commons Atribución-NoComercial 4.0 Internacional
    Ficheros en este ítem
    Nombre:
    Barbero-if_2024.pdf
    Tamaño:
    727.8Kb
    Formato:
    Adobe PDF
    Thumbnail
    Visualizar/Abrir

    Métricas

    Citas

    Academic Search
    Ver estadísticas de uso

    Exportar

    RISMendeleyRefworksZotero
    • edm
    • marc
    • xoai
    • qdc
    • ore
    • ese
    • dim
    • uketd_dc
    • oai_dc
    • etdms
    • rdf
    • mods
    • mets
    • didl
    • premis
    Mostrar el registro completo del ítem