Universidad de Burgos RIUBU Principal Default Universidad de Burgos RIUBU Principal Default
  • español
  • English
  • français
  • Deutsch
  • português (Brasil)
  • italiano
Universidad de Burgos RIUBU Principal Default
  • Ayuda
  • Kontakt
  • Feedback abschicken
  • Acceso abierto
    • Archivar en RIUBU
    • Acuerdos editoriales para la publicación en acceso abierto
    • Controla tus derechos, facilita el acceso abierto
    • Sobre el acceso abierto y la UBU
    • español
    • English
    • français
    • Deutsch
    • português (Brasil)
    • italiano
    • español
    • English
    • français
    • Deutsch
    • português (Brasil)
    • italiano
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Stöbern

    Gesamter BestandBereiche & SammlungenErscheinungsdatumAutorenTitelnSchlagwortenDiese SammlungErscheinungsdatumAutorenTitelnSchlagworten

    Mein Benutzerkonto

    EinloggenRegistrieren

    Statistiken

    Benutzungsstatistik

    Compartir

    Dokumentanzeige 
    •   RIUBU Startseite
    • E-Prints
    • Untitled
    • Untitled
    • Artículos ADMIRABLE
    • Dokumentanzeige
    •   RIUBU Startseite
    • E-Prints
    • Untitled
    • Untitled
    • Artículos ADMIRABLE
    • Dokumentanzeige

    Por favor, use este identificador para citar o enlazar este ítem: https://hdl.handle.net/10259/11429

    Título
    Deep learning and support vector machines for transcription start site identification
    Autor
    Barbero Aparicio, José AntonioAutoridad UBU Orcid
    Olivares Gil, AliciaAutoridad UBU Orcid
    Diez Pastor, José FranciscoAutoridad UBU Orcid
    García Osorio, CésarAutoridad UBU Orcid
    Publicado en
    PeerJ Computer Science. 2023, V. 9, e1340
    Editorial
    PeerJ
    Fecha de publicación
    2023-04
    ISSN
    2376-5992
    DOI
    10.7717/peerj-cs.1340
    Zusammenfassung
    Recognizing transcription start sites is key to gene identification. Several approaches have been employed in related problems such as detecting translation initiation sites or promoters, many of the most recent ones based on machine learning. Deep learning methods have been proven to be exceptionally effective for this task, but their use in transcription start site identification has not yet been explored in depth. Also, the very few existing works do not compare their methods to support vector machines (SVMs), the most established technique in this area of study, nor provide the curated dataset used in the study. The reduced amount of published papers in this specific problem could be explained by this lack of datasets. Given that both support vector machines and deep neural networks have been applied in related problems with remarkable results, we compared their performance in transcription start site predictions, concluding that SVMs are computationally much slower, and deep learning methods, specially long short-term memory neural networks (LSTMs), are best suited to work with sequences than SVMs. For such a purpose, we used the reference human genome GRCh38. Additionally, we studied two different aspects related to data processing: the proper way to generate training examples and the imbalanced nature of the data. Furthermore, the generalization performance of the models studied was also tested using the mouse genome, where the LSTM neural network stood out from the rest of the algorithms. To sum up, this article provides an analysis of the best architecture choices in transcription start site identification, as well as a method to generate transcription start site datasets including negative instances on any species available in Ensembl. We found that deep learning methods are better suited than SVMs to solve this problem, being more efficient and better adapted to long sequences and large amounts of data. We also create a transcription start site (TSS) dataset large enough to be used in deep learning experiments.
    Palabras clave
    Transcription start site
    Bioinformatics
    Machine learning
    Deep learning
    Support vector machine
    Long short-term memory
    Convolutional neural network
    Materia
    Bioinformática
    Bioinformatics
    Aprendizaje automático
    Machine learning
    URI
    https://hdl.handle.net/10259/11429
    Versión del editor
    https://doi.org/10.7717/peerj-cs.1340
    Aparece en las colecciones
    • Untitled
    • Artículos ADMIRABLE
    Atribución 4.0 Internacional
    Documento(s) sujeto(s) a una licencia Creative Commons Atribución 4.0 Internacional
    Dateien zu dieser Ressource
    Nombre:
    Barbero-peerj_2023.pdf
    Tamaño:
    455.8Kb
    Formato:
    Adobe PDF
    Thumbnail
    Öffnen

    Métricas

    Citas

    Ver estadísticas de uso

    Exportar

    RISMendeleyRefworksZotero
    • edm
    • marc
    • xoai
    • qdc
    • ore
    • ese
    • dim
    • uketd_dc
    • oai_dc
    • etdms
    • rdf
    • mods
    • mets
    • didl
    • premis
    Zur Langanzeige

    Universidad de Burgos

    Powered by MIT's. DSpace software, Version 5.10