<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="static/style.xsl"?><OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2026-04-18T13:54:55Z</responseDate><request verb="GetRecord" identifier="oai:riubu.ubu.es:10259/11430" metadataPrefix="xoai">https://riubu.ubu.es/oai/request</request><GetRecord><record><header><identifier>oai:riubu.ubu.es:10259/11430</identifier><datestamp>2026-02-26T01:05:59Z</datestamp><setSpec>com_10259_5377</setSpec><setSpec>com_10259_5086</setSpec><setSpec>com_10259_2604</setSpec><setSpec>com_10259_4219</setSpec><setSpec>col_10259_5378</setSpec><setSpec>col_10259_4220</setSpec></header><metadata><metadata xmlns="http://www.lyncode.com/xoai" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.lyncode.com/xoai http://www.lyncode.com/xsd/xoai.xsd">
<element name="dc">
<element name="contributor">
<element name="author">
<element name="none">
<field name="value">Olivares Gil, Alicia</field>
<field name="authority">749</field>
<field name="confidence">600</field>
<field name="orcid_id">0000-0002-3378-197X</field>
<field name="value">Barbero Aparicio, José Antonio</field>
<field name="authority">819</field>
<field name="confidence">600</field>
<field name="orcid_id">0000-0002-3269-0806</field>
<field name="value">Rodríguez Diez, Juan José</field>
<field name="authority">477</field>
<field name="confidence">600</field>
<field name="orcid_id">0000-0002-3291-2739</field>
<field name="value">Diez Pastor, José Francisco</field>
<field name="authority">156</field>
<field name="confidence">600</field>
<field name="orcid_id">0000-0001-5013-7505</field>
<field name="value">García Osorio, César</field>
<field name="authority">212</field>
<field name="confidence">600</field>
<field name="orcid_id">0000-0002-1206-1084</field>
<field name="value">Davari, Mehdi D.</field>
<field name="authority">71abb1c5-df2b-4933-8544-b775dd0c354d</field>
<field name="orcid_id"/>
</element>
</element>
</element>
<element name="date">
<element name="accessioned">
<element name="none">
<field name="value">2026-02-25T12:25:39Z</field>
</element>
</element>
<element name="available">
<element name="none">
<field name="value">2026-02-25T12:25:39Z</field>
</element>
</element>
<element name="issued">
<element name="none">
<field name="value">2025-12</field>
</element>
</element>
<element name="embargoEndDate">
<element name="none"/>
</element>
</element>
<element name="identifier">
<element name="issn">
<element name="none">
<field name="value">1758-2946</field>
</element>
</element>
<element name="uri">
<element name="none">
<field name="value">https://hdl.handle.net/10259/11430</field>
</element>
</element>
<element name="doi">
<element name="none">
<field name="value">10.1186/s13321-025-01029-w</field>
</element>
</element>
<element name="essn">
<element name="none">
<field name="value">1758-2946</field>
</element>
</element>
</element>
<element name="description">
<element name="abstract">
<element name="en">
<field name="value">Protein fitness prediction plays a crucial role in the advancement of protein engineering endeavours. However, the combinatorial complexity of the protein sequence space and the limited availability of assay-labelled data hinder the efficient optimization of protein properties. Data-driven strategies utilizing machine learning methods have emerged as a promising solution, yet their dependence on labelled training datasets poses a significant obstacle. To overcome this challenge, in this work, we explore various ways of introducing the latent information present in evolutionarily related sequences (homologous sequences) into the training process. To do so, we establish several strategies based on semi-supervised learning (unsupervised pre-processing and wrapper methods) and perform a comprehensive comparison using 19 datasets containing protein-fitness pairs. Our findings reveal that using the information present in the homologous sequences can improve the performance of the models, especially when the number of available labelled sequences is considerably low. Specifically, the combination of a sequence encoding method based on Direct Coupling Analysis (DCA), with MERGE (a hybrid regression framework that combines evolutionary information with supervised learning) and an SVM regressor, outperforms other encodings (PAM250, UniRep, eUniRep) and other semi-supervised wrapper methods (Tri-Training Regressor, Co-Training Regressor). In summary, the demonstrated performance gains of this strategy mark a substantial leap towards more robust and reliable predictive models for protein engineering tasks. This advancement holds the potential to streamline the design and optimisation of proteins for diverse applications in biotechnology and therapeutics.</field>
</element>
</element>
<element name="sponsorship">
<element name="en">
<field name="value">This work is supported by the Junta de Castilla León under project BU055P20 (JCyL/FEDER, UE), and the Ministry of Science and Innovation of Spain under the project PID2020-119894GB-I00/AEI/10.13039/501100011033, co-financed through FEDER funds from the European Union. M.D. Davari acknowledges funding by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) (SPP2363). A. Olivares-Gil is supported through Consejería de Educación of the Junta de Castilla y León and the European Social Fund through a pre-doctoral grant (EDU/875/2021). The research stay of A. Olivares-Gil and J.A. Barbero-Aparicio was financed by the Mobility grants for doctoral students of the University of Burgos, co-financed by Banco Santander S.A (2021/00001/007/001/018).</field>
</element>
</element>
</element>
<element name="format">
<element name="mimetype">
<element name="none">
<field name="value">application/pdf</field>
</element>
</element>
</element>
<element name="language">
<element name="iso">
<element name="es">
<field name="value">eng</field>
</element>
</element>
</element>
<element name="publisher">
<element name="es">
<field name="value">BioMed Central</field>
</element>
</element>
<element name="relation">
<element name="ispartof">
<element name="es">
<field name="value">Journal of Cheminformatics. 2025, V. 17, n. 1, 88</field>
</element>
</element>
<element name="publisherversion">
<element name="es">
<field name="value">https://doi.org/10.1186/s13321-025-01029-w</field>
</element>
</element>
</element>
<element name="rights">
<element name="*">
<field name="value">Attribution-NonCommercial-NoDerivatives 4.0 Internacional</field>
</element>
<element name="uri">
<element name="*">
<field name="value">http://creativecommons.org/licenses/by-nc-nd/4.0/</field>
</element>
</element>
<element name="accessRights">
<element name="es">
<field name="value">info:eu-repo/semantics/openAccess</field>
</element>
</element>
</element>
<element name="subject">
<element name="en">
<field name="value">Machine learning</field>
<field name="value">Protein engineering</field>
<field name="value">Directed evolution</field>
<field name="value">Semi-supervised learning</field>
<field name="value">Protein design</field>
<field name="value">Tritraining regressor</field>
<field name="value">Generalized MERGE</field>
</element>
<element name="other">
<element name="es">
<field name="value">Proteínas</field>
<field name="value">Bioinformática</field>
</element>
<element name="en">
<field name="value">Proteins</field>
<field name="value">Bioinformatics</field>
</element>
</element>
</element>
<element name="title">
<element name="en">
<field name="value">Semi-supervised prediction of protein fitness for data-driven protein engineering</field>
</element>
</element>
<element name="type">
<element name="es">
<field name="value">info:eu-repo/semantics/article</field>
</element>
<element name="hasVersion">
<element name="es">
<field name="value">info:eu-repo/semantics/publishedVersion</field>
</element>
</element>
</element>
<element name="journal">
<element name="title">
<element name="es">
<field name="value">Journal of Cheminformatics</field>
</element>
</element>
</element>
<element name="volume">
<element name="number">
<element name="es">
<field name="value">17</field>
</element>
</element>
</element>
<element name="issue">
<element name="number">
<element name="es">
<field name="value">1</field>
</element>
</element>
</element>
</element>
<element name="bundles">
<element name="bundle">
<field name="name">THUMBNAIL</field>
<element name="bitstreams">
<element name="bitstream">
<field name="name">Olivares-jc_2025.pdf.jpg</field>
<field name="originalName">Olivares-jc_2025.pdf.jpg</field>
<field name="description">IM Thumbnail</field>
<field name="format">image/jpeg</field>
<field name="size">4835</field>
<field name="url">https://riubu.ubu.es/bitstream/10259/11430/4/Olivares-jc_2025.pdf.jpg</field>
<field name="checksum">3dabd98d18eff8d18e61dbfd6cb1ca93</field>
<field name="checksumAlgorithm">MD5</field>
<field name="sid">4</field>
</element>
</element>
</element>
<element name="bundle">
<field name="name">LICENSE</field>
<element name="bitstreams">
<element name="bitstream">
<field name="name">license.txt</field>
<field name="originalName">license.txt</field>
<field name="format">text/plain; charset=utf-8</field>
<field name="size">999</field>
<field name="url">https://riubu.ubu.es/bitstream/10259/11430/3/license.txt</field>
<field name="checksum">b295bcbce42e2caabeb0c623d3860c06</field>
<field name="checksumAlgorithm">MD5</field>
<field name="sid">3</field>
</element>
</element>
</element>
<element name="bundle">
<field name="name">CC-LICENSE</field>
<element name="bitstreams">
<element name="bitstream">
<field name="name">license_rdf</field>
<field name="originalName">license_rdf</field>
<field name="format">application/rdf+xml; charset=utf-8</field>
<field name="size">805</field>
<field name="url">https://riubu.ubu.es/bitstream/10259/11430/2/license_rdf</field>
<field name="checksum">4460e5956bc1d1639be9ae6146a50347</field>
<field name="checksumAlgorithm">MD5</field>
<field name="sid">2</field>
</element>
</element>
</element>
<element name="bundle">
<field name="name">ORIGINAL</field>
<element name="bitstreams">
<element name="bitstream">
<field name="name">Olivares-jc_2025.pdf</field>
<field name="originalName">Olivares-jc_2025.pdf</field>
<field name="description"/>
<field name="format">application/pdf</field>
<field name="size">2188067</field>
<field name="url">https://riubu.ubu.es/bitstream/10259/11430/1/Olivares-jc_2025.pdf</field>
<field name="checksum">fc0bbf0d7d4f0f76581c0fde3f5f247a</field>
<field name="checksumAlgorithm">MD5</field>
<field name="sid">1</field>
</element>
</element>
</element>
</element>
<element name="others">
<field name="handle">10259/11430</field>
<field name="identifier">oai:riubu.ubu.es:10259/11430</field>
<field name="lastModifyDate">2026-02-26 02:05:59.294</field>
</element>
<element name="repository">
<field name="name">Repositorio Institucional de la Universidad de Burgos</field>
<field name="mail">bubrep@ubu.es</field>
</element>
<element name="license">
<field name="bin">RWwgYXV0b3IgY29tbyDDum5pY28gdGl0dWxhciBkZSBsb3MgZGVyZWNob3MgZGUgcHJvcGllZGFkIGludGVsZWN0dWFsIGRlIGxhIG9icmEsIG8gZGlzcG9uaWVuZG8gZGUgbG9zIGRlYmlkb3MgcGVybWlzb3MgZGUgbG9zIG90cm9zIHRpdHVsYXJlcywgc2kgbG9zIGh1YmllcmEsIHkgZW4gdmlydHVkIGRlIGxvcyBkZXJlY2hvcyBxdWUgbGUgY29uZmllcmUgbGEgbGVnaXNsYWNpw7NuIHZpZ2VudGUgc29icmUgcHJvcGllZGFkIGludGVsZWN0dWFsIHkgZGVyZWNob3MgZGUgYXV0b3IsIA0KQVVUT1JJWkEgYSBsYSBVbml2ZXJzaWRhZCBkZSBCdXJnb3MgYSBkaWZ1bmRpciwgZGUgbWFuZXJhIGdyYXR1aXRhLCBlbCBjb250ZW5pZG8gZGUgbG9zIGFyY2hpdm9zIGRpZ2l0YWxlcyBxdWUgY29ycmVzcG9uZGVuIGFsIGRvY3VtZW50byBkZXNjcml0byBhbnRlcmlvcm1lbnRlLCBjb24gY2Fyw6FjdGVyIG5vIGV4Y2x1c2l2byB5IGRlIG1hbmVyYSBww7pibGljYSBlbiBhY2Nlc28gYWJpZXJ0byBhIHRyYXbDqXMgZGUgSW50ZXJuZXQsIHBhcmEgbG8gcXVlIGxhIEJpYmxpb3RlY2EgcHJvY2VkZXLDoSBhIGFyY2hpdmFybG9zIGVuIGVsIFJlcG9zaXRvcmlvIEluc3RpdHVjaW9uYWwuIEFzaW1pc21vIGF1dG9yaXphIGEgbGEgVW5pdmVyc2lkYWQgZGUgQnVyZ29zIGEgcmVhbGl6YXIgbGFzIHRyYW5zZm9ybWFjaW9uZXMgbmVjZXNhcmlhcyBkZSBmb3JtYXRvLCBubyBkZSBjb250ZW5pZG8sIHBhcmEgZ2FyYW50aXphciBsYSBwcmVzZXJ2YWNpw7NuIHkgZWwgYWNjZXNvIGVuIGVsIGZ1dHVyby4NCg0KRWwgYXV0b3IgZGlzcG9uZSwgZW4gdG9kbyBjYXNvLCBkZWwgZGVyZWNobyBhIHJldm9jYXIgZXN0YSBhdXRvcml6YWNpw7NuLg0KDQpMYSBjZXNpw7NuIGRlIGRlcmVjaG9zIGRlIGVzdGEgb2JyYSBzZSBlbmN1ZW50cmEgc3VqZXRhIGEgbGEgbGVnaXNsYWNpw7NuIHZpZ2VudGUgc29icmUgcHJvcGllZGFkIGludGVsZWN0dWFsIHkgZGVyZWNob3MgZGUgYXV0b3Iu</field>
</element>
</metadata></metadata></record></GetRecord></OAI-PMH>