<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="static/style.xsl"?><OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"><responseDate>2026-04-18T21:08:16Z</responseDate><request verb="GetRecord" identifier="oai:riubu.ubu.es:10259/11430" metadataPrefix="mets">https://riubu.ubu.es/oai/request</request><GetRecord><record><header><identifier>oai:riubu.ubu.es:10259/11430</identifier><datestamp>2026-02-26T01:05:59Z</datestamp><setSpec>com_10259_5377</setSpec><setSpec>com_10259_5086</setSpec><setSpec>com_10259_2604</setSpec><setSpec>com_10259_4219</setSpec><setSpec>col_10259_5378</setSpec><setSpec>col_10259_4220</setSpec></header><metadata><mets xmlns="http://www.loc.gov/METS/" xmlns:doc="http://www.lyncode.com/xoai" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xlink="http://www.w3.org/1999/xlink" xsi:schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/mets.xsd" PROFILE="DSpace METS SIP Profile 1.0" TYPE="DSpace ITEM" ID="&#xa;&#x9;&#x9;&#x9;&#x9;DSpace_ITEM_10259-11430" OBJID="&#xa;&#x9;&#x9;&#x9;&#x9;hdl:10259/11430">
<metsHdr CREATEDATE="2026-04-18T23:08:16Z">
<agent TYPE="ORGANIZATION" ROLE="CUSTODIAN">
<name>Repositorio Institucional de la Universidad de Burgos</name>
</agent>
</metsHdr>
<dmdSec ID="DMD_10259_11430">
<mdWrap MDTYPE="MODS">
<xmlData xmlns:mods="http://www.loc.gov/mods/v3" xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-1.xsd">
<mods:mods xsi:schemaLocation="http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-1.xsd">
<mods:name>
<mods:role>
<mods:roleTerm type="text">author</mods:roleTerm>
</mods:role>
<mods:namePart>Olivares Gil, Alicia</mods:namePart>
</mods:name>
<mods:name>
<mods:role>
<mods:roleTerm type="text">author</mods:roleTerm>
</mods:role>
<mods:namePart>Barbero Aparicio, José Antonio</mods:namePart>
</mods:name>
<mods:name>
<mods:role>
<mods:roleTerm type="text">author</mods:roleTerm>
</mods:role>
<mods:namePart>Rodríguez Diez, Juan José</mods:namePart>
</mods:name>
<mods:name>
<mods:role>
<mods:roleTerm type="text">author</mods:roleTerm>
</mods:role>
<mods:namePart>Diez Pastor, José Francisco</mods:namePart>
</mods:name>
<mods:name>
<mods:role>
<mods:roleTerm type="text">author</mods:roleTerm>
</mods:role>
<mods:namePart>García Osorio, César</mods:namePart>
</mods:name>
<mods:name>
<mods:role>
<mods:roleTerm type="text">author</mods:roleTerm>
</mods:role>
<mods:namePart>Davari, Mehdi D.</mods:namePart>
</mods:name>
<mods:extension>
<mods:dateAccessioned encoding="iso8601">2026-02-25T12:25:39Z</mods:dateAccessioned>
</mods:extension>
<mods:extension>
<mods:dateAvailable encoding="iso8601">2026-02-25T12:25:39Z</mods:dateAvailable>
</mods:extension>
<mods:originInfo>
<mods:dateIssued encoding="iso8601">2025-12</mods:dateIssued>
</mods:originInfo>
<mods:identifier type="issn">1758-2946</mods:identifier>
<mods:identifier type="uri">https://hdl.handle.net/10259/11430</mods:identifier>
<mods:identifier type="doi">10.1186/s13321-025-01029-w</mods:identifier>
<mods:identifier type="essn">1758-2946</mods:identifier>
<mods:abstract>Protein fitness prediction plays a crucial role in the advancement of protein engineering endeavours. However, the combinatorial complexity of the protein sequence space and the limited availability of assay-labelled data hinder the efficient optimization of protein properties. Data-driven strategies utilizing machine learning methods have emerged as a promising solution, yet their dependence on labelled training datasets poses a significant obstacle. To overcome this challenge, in this work, we explore various ways of introducing the latent information present in evolutionarily related sequences (homologous sequences) into the training process. To do so, we establish several strategies based on semi-supervised learning (unsupervised pre-processing and wrapper methods) and perform a comprehensive comparison using 19 datasets containing protein-fitness pairs. Our findings reveal that using the information present in the homologous sequences can improve the performance of the models, especially when the number of available labelled sequences is considerably low. Specifically, the combination of a sequence encoding method based on Direct Coupling Analysis (DCA), with MERGE (a hybrid regression framework that combines evolutionary information with supervised learning) and an SVM regressor, outperforms other encodings (PAM250, UniRep, eUniRep) and other semi-supervised wrapper methods (Tri-Training Regressor, Co-Training Regressor). In summary, the demonstrated performance gains of this strategy mark a substantial leap towards more robust and reliable predictive models for protein engineering tasks. This advancement holds the potential to streamline the design and optimisation of proteins for diverse applications in biotechnology and therapeutics.</mods:abstract>
<mods:language>
<mods:languageTerm authority="rfc3066">eng</mods:languageTerm>
</mods:language>
<mods:accessCondition type="useAndReproduction">Attribution-NonCommercial-NoDerivatives 4.0 Internacional</mods:accessCondition>
<mods:subject>
<mods:topic>Machine learning</mods:topic>
</mods:subject>
<mods:subject>
<mods:topic>Protein engineering</mods:topic>
</mods:subject>
<mods:subject>
<mods:topic>Directed evolution</mods:topic>
</mods:subject>
<mods:subject>
<mods:topic>Semi-supervised learning</mods:topic>
</mods:subject>
<mods:subject>
<mods:topic>Protein design</mods:topic>
</mods:subject>
<mods:subject>
<mods:topic>Tritraining regressor</mods:topic>
</mods:subject>
<mods:subject>
<mods:topic>Generalized MERGE</mods:topic>
</mods:subject>
<mods:titleInfo>
<mods:title>Semi-supervised prediction of protein fitness for data-driven protein engineering</mods:title>
</mods:titleInfo>
<mods:genre>info:eu-repo/semantics/article</mods:genre>
</mods:mods>
</xmlData>
</mdWrap>
</dmdSec>
<amdSec ID="TMD_10259_11430">
<rightsMD ID="RIG_10259_11430">
<mdWrap OTHERMDTYPE="DSpaceDepositLicense" MDTYPE="OTHER" MIMETYPE="text/plain">
<binData>RWwgYXV0b3IgY29tbyDDum5pY28gdGl0dWxhciBkZSBsb3MgZGVyZWNob3MgZGUgcHJvcGllZGFkIGludGVsZWN0dWFsIGRlIGxhIG9icmEsIG8gZGlzcG9uaWVuZG8gZGUgbG9zIGRlYmlkb3MgcGVybWlzb3MgZGUgbG9zIG90cm9zIHRpdHVsYXJlcywgc2kgbG9zIGh1YmllcmEsIHkgZW4gdmlydHVkIGRlIGxvcyBkZXJlY2hvcyBxdWUgbGUgY29uZmllcmUgbGEgbGVnaXNsYWNpw7NuIHZpZ2VudGUgc29icmUgcHJvcGllZGFkIGludGVsZWN0dWFsIHkgZGVyZWNob3MgZGUgYXV0b3IsIA0KQVVUT1JJWkEgYSBsYSBVbml2ZXJzaWRhZCBkZSBCdXJnb3MgYSBkaWZ1bmRpciwgZGUgbWFuZXJhIGdyYXR1aXRhLCBlbCBjb250ZW5pZG8gZGUgbG9zIGFyY2hpdm9zIGRpZ2l0YWxlcyBxdWUgY29ycmVzcG9uZGVuIGFsIGRvY3VtZW50byBkZXNjcml0byBhbnRlcmlvcm1lbnRlLCBjb24gY2Fyw6FjdGVyIG5vIGV4Y2x1c2l2byB5IGRlIG1hbmVyYSBww7pibGljYSBlbiBhY2Nlc28gYWJpZXJ0byBhIHRyYXbDqXMgZGUgSW50ZXJuZXQsIHBhcmEgbG8gcXVlIGxhIEJpYmxpb3RlY2EgcHJvY2VkZXLDoSBhIGFyY2hpdmFybG9zIGVuIGVsIFJlcG9zaXRvcmlvIEluc3RpdHVjaW9uYWwuIEFzaW1pc21vIGF1dG9yaXphIGEgbGEgVW5pdmVyc2lkYWQgZGUgQnVyZ29zIGEgcmVhbGl6YXIgbGFzIHRyYW5zZm9ybWFjaW9uZXMgbmVjZXNhcmlhcyBkZSBmb3JtYXRvLCBubyBkZSBjb250ZW5pZG8sIHBhcmEgZ2FyYW50aXphciBsYSBwcmVzZXJ2YWNpw7NuIHkgZWwgYWNjZXNvIGVuIGVsIGZ1dHVyby4NCg0KRWwgYXV0b3IgZGlzcG9uZSwgZW4gdG9kbyBjYXNvLCBkZWwgZGVyZWNobyBhIHJldm9jYXIgZXN0YSBhdXRvcml6YWNpw7NuLg0KDQpMYSBjZXNpw7NuIGRlIGRlcmVjaG9zIGRlIGVzdGEgb2JyYSBzZSBlbmN1ZW50cmEgc3VqZXRhIGEgbGEgbGVnaXNsYWNpw7NuIHZpZ2VudGUgc29icmUgcHJvcGllZGFkIGludGVsZWN0dWFsIHkgZGVyZWNob3MgZGUgYXV0b3Iu</binData>
</mdWrap>
</rightsMD>
</amdSec>
<amdSec ID="FO_10259_11430_1">
<techMD ID="TECH_O_10259_11430_1">
<mdWrap MDTYPE="PREMIS">
<xmlData xmlns:premis="http://www.loc.gov/standards/premis" xsi:schemaLocation="http://www.loc.gov/standards/premis http://www.loc.gov/standards/premis/PREMIS-v1-0.xsd">
<premis:premis>
<premis:object>
<premis:objectIdentifier>
<premis:objectIdentifierType>URL</premis:objectIdentifierType>
<premis:objectIdentifierValue>https://riubu.ubu.es/bitstream/10259/11430/1/Olivares-jc_2025.pdf</premis:objectIdentifierValue>
</premis:objectIdentifier>
<premis:objectCategory>File</premis:objectCategory>
<premis:objectCharacteristics>
<premis:fixity>
<premis:messageDigestAlgorithm>MD5</premis:messageDigestAlgorithm>
<premis:messageDigest>fc0bbf0d7d4f0f76581c0fde3f5f247a</premis:messageDigest>
</premis:fixity>
<premis:size>2188067</premis:size>
<premis:format>
<premis:formatDesignation>
<premis:formatName>application/pdf</premis:formatName>
</premis:formatDesignation>
</premis:format>
</premis:objectCharacteristics>
<premis:originalName>Olivares-jc_2025.pdf</premis:originalName>
</premis:object>
</premis:premis>
</xmlData>
</mdWrap>
</techMD>
</amdSec>
<fileSec>
<fileGrp USE="ORIGINAL">
<file ID="BITSTREAM_ORIGINAL_10259_11430_1" MIMETYPE="application/pdf" SEQ="1" SIZE="2188067" CHECKSUM="fc0bbf0d7d4f0f76581c0fde3f5f247a" CHECKSUMTYPE="MD5" ADMID="FO_10259_11430_1" GROUPID="GROUP_BITSTREAM_10259_11430_1">
<FLocat xlink:type="simple" LOCTYPE="URL" xlink:href="https://riubu.ubu.es/bitstream/10259/11430/1/Olivares-jc_2025.pdf"/>
</file>
</fileGrp>
</fileSec>
<structMap TYPE="LOGICAL" LABEL="DSpace Object">
<div TYPE="DSpace Object Contents" ADMID="DMD_10259_11430">
<div TYPE="DSpace BITSTREAM">
<fptr FILEID="BITSTREAM_ORIGINAL_10259_11430_1"/>
</div>
</div>
</structMap>
</mets></metadata></record></GetRecord></OAI-PMH>