Norwegian version of this page

Metadata and documentation

Good data documentation and metadata is essential for making your research reproducible and understandable to others, as well as for the future-you when you work with the same dataset again. 

Metadata

What is metadata? 

Metadata is “data about data”. This means that metadata are descriptions or information about an object, an entity, a variable or a datapoint, and the primary purpose of metadata is to describe or annotate the data it accompanies in a machine-readable way.  

What are examples of metadata? 

We can find lots of metadata in everyday life. If we look at the last call we made from our mobile phone, we can see the information about the exact time of the call, the duration of the call, and who the call has been made to - this is all metadata about the call. Even though we have not recorded the call itself, we can still extract lots of information from these descriptions.  

Metadata can also function as data depending on what they are used for. For example, if somebody is studying the average duration of phone calls, our call’s metadata will be used as data in that project.  

What are different types of metadata? 

There are different types of metadata. We primarily discriminate between: 

  • Administrative metadata, that is information about the project that is relevant for managing it (e.g. information about the project’s licenses and access rights) 
  • Descriptive or citation metadata, that is information that allows others to find the dataset (e.g. dataset’s author, title and keywords) 
  • Structural metadata, that is data about how the dataset is internally structured (e.g. the names of the variables in the dataset) 

How do I create metadata for my project? 

To create basic metadata for your project or a single dataset, you can follow the fifteen “core” metadata elements developed by the Dublin Core Metadata Initiative (DCMI), a global initiative to formally standardize metadata terms. The elements include terms such as “Title”, specifying the title of your dataset or a resource, “Subject”, specifying the main topic of the resource, or “Description”, specifying the content of the resource.

How do I find discipline-specific metadata standards? 

Besides these core elements, different disciplines will have different standards for creating and structuring metadata. Make sure you familiarize yourself with the existing discipline-specific metadata standards (e.g. RDA database for metadata standards) before you create metadata for your project. 

More information

Documentation

What is documentation? 

Clear and systematic documentation improves research quality, so in addition to creating metadata, it is important to document the handling of the data in the project and the project workflows.  

What are examples of documentation? 

For example, you can create a text file in the raw data folder that will say “All data files in this directory are raw data files that should not be altered. For processed or edited data files, go to data/processed directory”. In this way, you provide instructions for others and for yourself on what types of data are where and how to handle them. Backing up any notes about the study or project progress (e.g. lab notebooks or field journals) is also considered documentation. 

What are different types of documentation? 

We can document both the study (project-level documentation) and the data itself (data-level documentation).  

  • Project-level documentation explains the aims of the study, what the research questions/hypotheses are, what methodologies were being used, what instruments and measures were being used, etc.  
  • Data-level or object-level documentation provides information at the level of individual objects such as images or variables in a database/table or transcripts, etc. 

How do I create documentation for my project? 

One way of creating systematic documentation for your study and data is to use README.txt files.  

README.txt files are simple text files that provide information about any elements (or any workflows or processes these elements have been used for) in the same directory. In other words, they provide a map through the files in the directory.  

README.txt files serve both as instructions for others as well as for the “future you” for when you will be reusing your own data in the future or redoing data analyses.  

You can use this guide to start creating README.txt files for your project and data.  

*Note that some of the information you include in these files will overlap with administrative and/or descriptive metadata (metadata is also documentation!). 

Another important aspect of good documentation is version control. By controlling the version of every data transformation or every project document update, you can track (and later track back!) any changes. A simple version control strategy is to write the version number in the file name (e.g. “Manuscript_draft_V1” or “Processed_dataset_V1”) and update it when needed. But there are also more automated version control tools such as Git for controlling the version of, for example, the analysis code. 

More information

Need advice?

Contact us at: research-data@uio.no

Tags: data, metadata, documentation
Published June 16, 2022 9:24 AM - Last modified Apr. 12, 2023 11:05 AM