This readme.txt file was generated on 20240402 by Silvia Díaz de la Fuente 

GENERAL INFORMATION 
-------------------
1. Dataset title: Dataset of Tweets on Assets of Cultural Interest Along the French Way in Castilla y León (2009-2023)

2. Authors:

Name: Silvia Díaz-de la Fuente
Institution: Departamento de Ingeniería de Organización. Universidad de Burgos. 
Email: sddelafuente@ubu.es
ORCID: https://orcid.org/0000-0002-5961-3368

Name: José Ignacio Santos
Institution: Departamento de Ingeniería de Organización. Universidad de Burgos 
Email: jisantos@ubu.es 
ORCID:  https://orcid.org/0000-0002-6653-043X 

Name: Virginia Ahedo
Institution: Departamento de Ingeniería de Organización. Universidad de Burgos 
Email: vahedo@ubu.es 
ORCID: https://orcid.org/0000-0002-9812-388X 

Name: María Pilar Alonso Abad
Institution: Departamento de Historia, Geografía y Comunicación. Universidad de Burgos. 
Email: mpaabad@ubu.es
ORCID: https://orcid.org/0000-0002-6268-9443

Name: José Manuel Galán 
Institution: Departamento de Ingeniería de Organización. Universidad de Burgos. 
Email: jmgalan@ubu.es 
ORCID: https://orcid.org/0000-0003-3360-7602


DESCRIPTION
-----------
1. Language: English

2. Abstract: 
This dataset comprises a comprehensive collection of tweets pertaining to Assets of Cultural Interest along the French Way in Castilla y León, spanning from January 1, 2009, to March 24, 2023. Assembled with the aid of the twarc2 Python package and academic access credentials, the dataset provides raw, unprocessed data for each cultural landmark, featuring a wide array of fields including text, date, language as identified by Twitter, author, number of retweets, and more.

3. Keywords:
Twitter Data, Cultural Heritage, French Way, Castilla y León, Social Media Analytics, Public Engagement, Tourism Trends, Community Impact, Historical Sites, Data Collection, Bienes de Interés Cultural,Assets of Cultural Interest, Linguistic Analysis, Real-time Analytics, Cultural Promotion, Heritage Management.

4. Date of data collection
January 1, 2009 - March 24, 2023

5. Date of dataset publication
April 2024

6. Funding:
The authors would like to acknowledge the support and funding from the Spanish Ministry of Science and Innovation through its networks of excellence HAR2017-90883-REDC and RED2018-102518-T, and the project PID2020118906GB-I00, as well as from the Regional Government of Castilla y León - Department of Education (BDNS 425389), and the FWO-WOG (W001220N). Additionally, this work has been partially funded by the European Social Fund, through the predoctoral contract awarded to Silvia Díaz de la Fuente by the Department of Education of the Regional Government of Castilla y León. We acknowledge Santander Supercomputacion support group at the University of Cantabria who provided access to the supercomputer Altamira Supercomputer at the Institute of Physics of Cantabria (IFCA-CSIC), member of the Spanish Supercomputing Network, for performing processing work.

7. Geographic location/s of data collection:
Castilla y León, Spain, specifically along the French Way of the Camino de Santiago.

ACCESS INFORMATION
------------------
1. Dataset Creative Commons License: 
CC BY-NC

2. Dataset DOI:

3. Related publication:
El Patrimonio Jacobeo y su gestión desde las Humanidades Digitales: presente y futuro del Camino de Santiago en Castilla y León. PhD dissertation of Silvia Díaz de la Fuente.

METHODOLOGICAL INFORMATION
--------------------------
The dataset was curated through a data collection process using the twarc2 Python package using academic credential for Twitter data. Search terms employed for each Asset of Cultural Interest are detailed in the "BIC_query" file, primarily consisting of simplified denominations—excluding articles and prepositions—of these sites. Initially, there was a consideration to extend the search to include English translations of the names, but this was ultimately deemed unnecessary as proper names of monuments are typically not translated in social media contexts. By limiting the search to Spanish terms, potential confusions and errors associated with translations were effectively minimized. The "BIC_query" file included in the dataset provides a comprehensive list of the Spanish search terms used for data collection.

FILE OVERVIEW
-------------
1. README.txt: This text file serves as a comprehensive guide to the dataset, offering an overview of the dataset's purpose, the methodology of data collection, a brief description of the files included, acknowledgments for support and funding, and any additional notes that users might find helpful for understanding or utilizing the dataset effectively.

2. Twitter_bics.zip: A compressed ZIP file containing the raw dataset in CSV format, named "Twitter_bics.csv" once extracted. The compression has been applied to reduce the file size for more efficient downloading and storage. The CSV file houses the raw tweets data, including fields such as text, date, language, author, number of retweets, and more, corresponding to the Assets of Cultural Interest along the French Way in Castilla y León, collected solely based on the listed terms in the BIC_query files.

3. BIC_query.ods: This file, in OpenDocument Spreadsheet format, lists the specific search terms used to gather tweets related to the Assets of Cultural Interest along the French Way in Castilla y León. The terms are simplified, omitting accents, articles and prepositions to enhance search accuracy and relevance.

4. BIC_query.xlsx: An Excel version of the BIC_query.ods file, providing the same list of search terms used for data collection. This format offers compatibility for users who prefer Microsoft Excel for data review and analysis.


TABULAR DATA-SPECIFIC INFORMATION
---------------------------------

id: Unique identifier for the tweet.
conversation_id: Identifier for the conversation thread the tweet is a part of.
referenced_tweets.replied_to.id: ID of the original tweet if the captured tweet is a reply.
referenced_tweets.retweeted.id: ID of the original tweet if the captured tweet is a retweet.
referenced_tweets.quoted.id: ID of the original tweet if the captured tweet quotes another tweet.
author_id: Unique identifier for the author of the tweet.
in_reply_to_user_id: User ID that the tweet is in reply to, if applicable.
in_reply_to_username: Username that the tweet is in reply to, if applicable.
retweeted_user_id: User ID of the original author if the tweet is a retweet.
retweeted_username: Username of the original author if the tweet is a retweet.
quoted_user_id: User ID of the author of a quoted tweet.
quoted_username: Username of the author of a quoted tweet.
created_at: Timestamp of when the tweet was created.
text: The text content of the tweet.
lang: Language code of the tweet.
source: Platform or method used to post the tweet.
public_metrics.impression_count: Number of times the tweet has been viewed.
public_metrics.reply_count: Number of replies to the tweet.
public_metrics.retweet_count: Number of times the tweet has been retweeted.
public_metrics.quote_count: Number of times the tweet has been quoted.
public_metrics.like_count: Number of likes for the tweet.
reply_settings: Information on who can reply to the tweet.
edit_history_tweet_ids: IDs of tweet edits, if editable.
edit_controls.edits_remaining: Number of edits remaining for the tweet, if editable.
edit_controls.editable_until: Timestamp until when the tweet can be edited.
edit_controls.is_edit_eligible: Indicates if the tweet is eligible for editing.
possibly_sensitive: Flags if the tweet may contain sensitive content.
withheld.scope: Scope of content withheld (if any).
withheld.copyright: Indicates if the tweet is withheld due to copyright reasons.
withheld.country_codes: Country codes where the tweet is withheld.
entities.annotations: Annotations added to the tweet by Twitter or users.
entities.cashtags: List of financial symbols or stock ticker symbols mentioned in the tweet.
entities.hashtags: List of hashtags mentioned in the tweet.
entities.mentions: List of user mentions in the tweet.
entities.urls: List of URLs included in the tweet.
context_annotations: Additional context annotations for the tweet.
attachments.media: Information about media attached to the tweet.
attachments.media_keys: Keys identifying media attached to the tweet.
attachments.poll.duration_minutes: Duration of the poll attached to the tweet in minutes.
attachments.poll.end_datetime: End datetime for the poll.
attachments.poll.id: Unique identifier for the poll.
attachments.poll.options: Options for the poll.
attachments.poll.voting_status: Status of the poll, whether open or closed.
attachments.poll_ids: Identifiers for polls attached to the tweet.
author.id: Unique identifier for the author of the tweet.
author.created_at: Timestamp when the author's account was created.
author.username: Username of the author.
author.name: Name of the author.
author.description: Description provided by the author in their profile.
author.entities.description.cashtags: Financial symbols in the author's profile description.
author.entities.description.hashtags: Hashtags in the author's profile description.
author.entities.description.mentions: Mentions in the author's profile description.
author.entities.description.urls: URLs in the author's profile description.
author.entities.url.urls: URLs associated with the author's account.
author.url: URL provided by the author in their profile.
author.location: Location provided by the author in their profile.
author.pinned_tweet_id: ID of the tweet pinned by the author on their profile.
author.profile_image_url: URL of the author's profile image.
author.protected: Indicates if the author's account is protected.
author.public_metrics.followers_count: Number of followers the author has.
author.public_metrics.following_count: Number of accounts the author is following.
author.public_metrics.listed_count: Number of public lists that include the author.
author.public_metrics.tweet_count: Total number of tweets the author has posted.
author.verified: Indicates if the author's account is verified.
author.verified_type: Type of verification the author's account has.
author.withheld.scope: Scope of content withheld by the author, if any.
author.withheld.copyright: Indicates if the author's content is withheld due to copyright.
author.withheld.country_codes: Country codes where the author's content is withheld.
geo.coordinates.coordinates: Geographical coordinates associated with the tweet.
geo.coordinates.type: Type of geographical data provided.
geo.country: Country associated with the tweet's location.
geo.country_code: Country code associated with the tweet's location.
geo.full_name: Full name of the location associated with the tweet.
geo.geo.bbox: Bounding box of the location as specified in the tweet.
geo.geo.type: Type of geographical location specified.
geo.id: Unique identifier for the geographical location.
geo.name: Name of the geographical location.
geo.place_id: Identifier for the place mentioned in the tweet.
geo.place_type: Type of place mentioned in the tweet.
matching_rules: Rules that were matched for the tweet to be collected.
X__twarc.retrieved_at: Timestamp when the tweet was retrieved by Twarc.
X__twarc.url: URL of the Twarc retrieval.
X__twarc.version: Version of Twarc used for data collection.
numero_archivo: File number, potentially for organizational or reference purposes.
mes: The month when the tweet was posted or collected.
bic_name: Name of the Asset of Cultural Interest related to the tweet.
bic_query: The specific query term used to collect the tweet.