Mostra i principali dati dell'item

dc.contributor.authorBarbero Aparicio, José Antonio 
dc.contributor.authorOlivares Gil, Alicia 
dc.contributor.authorDiez Pastor, José Francisco 
dc.contributor.authorGarcía Osorio, César 
dc.date.accessioned2026-02-25T12:02:56Z
dc.date.available2026-02-25T12:02:56Z
dc.date.issued2023-04
dc.identifier.issn2376-5992
dc.identifier.urihttps://hdl.handle.net/10259/11429
dc.description.abstractRecognizing transcription start sites is key to gene identification. Several approaches have been employed in related problems such as detecting translation initiation sites or promoters, many of the most recent ones based on machine learning. Deep learning methods have been proven to be exceptionally effective for this task, but their use in transcription start site identification has not yet been explored in depth. Also, the very few existing works do not compare their methods to support vector machines (SVMs), the most established technique in this area of study, nor provide the curated dataset used in the study. The reduced amount of published papers in this specific problem could be explained by this lack of datasets. Given that both support vector machines and deep neural networks have been applied in related problems with remarkable results, we compared their performance in transcription start site predictions, concluding that SVMs are computationally much slower, and deep learning methods, specially long short-term memory neural networks (LSTMs), are best suited to work with sequences than SVMs. For such a purpose, we used the reference human genome GRCh38. Additionally, we studied two different aspects related to data processing: the proper way to generate training examples and the imbalanced nature of the data. Furthermore, the generalization performance of the models studied was also tested using the mouse genome, where the LSTM neural network stood out from the rest of the algorithms. To sum up, this article provides an analysis of the best architecture choices in transcription start site identification, as well as a method to generate transcription start site datasets including negative instances on any species available in Ensembl. We found that deep learning methods are better suited than SVMs to solve this problem, being more efficient and better adapted to long sequences and large amounts of data. We also create a transcription start site (TSS) dataset large enough to be used in deep learning experiments.en
dc.description.sponsorshipThis work has been supported by the Junta de Castilla y León under project BU055P20 (JCyL/FEDER, UE), by the Ministry of Science and Innovation under project PID2020- 119894GB-I00, co-financed through European Union FEDER funds and by Fundación Bancaria Caixa under project 2020/00062/001. José A. Barbero-Aparicio is founded through a pre-doctoral grant by the University of Burgos and Alicia Olivares-Gil is founded by the predoctoral grant from the Department of Education of Junta de Castilla y León (VA) (ORDEN EDU/875/2021) (Spain). NVIDIA Corporation donated the TITAN Xp GPUs used in this research. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.en
dc.format.mimetypeapplication/pdf
dc.language.isoenges
dc.publisherPeerJes
dc.relation.ispartofPeerJ Computer Science. 2023, V. 9, e1340es
dc.rightsAtribución 4.0 Internacional*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/*
dc.subjectTranscription start siteen
dc.subjectBioinformaticsen
dc.subjectMachine learningen
dc.subjectDeep learningen
dc.subjectSupport vector machineen
dc.subjectLong short-term memoryen
dc.subjectConvolutional neural networken
dc.subject.otherBioinformáticaes
dc.subject.otherBioinformaticsen
dc.subject.otherAprendizaje automáticoes
dc.subject.otherMachine learningen
dc.titleDeep learning and support vector machines for transcription start site identificationen
dc.typeinfo:eu-repo/semantics/articlees
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses
dc.relation.publisherversionhttps://doi.org/10.7717/peerj-cs.1340es
dc.identifier.doi10.7717/peerj-cs.1340
dc.identifier.essn2376-5992
dc.journal.titlePeerJ Computer Sciencees
dc.volume.number9es
dc.page.initiale1340es
dc.type.hasVersioninfo:eu-repo/semantics/publishedVersiones


Files in questo item

Thumbnail

Questo item appare nelle seguenti collezioni

Mostra i principali dati dell'item