Before we dive further into the topic I just wanted to make it clear that this article is about the FAIR data principles, Findability, Accessibility, Interoperability and Reuse, which aim to help people to develop a better strategy regarding their digital assets for contiued reuse.
I recently wrote a White paper for one of my clients, looking at the FAIR data principles in relation to databases and how Graph databases, a new breed of NOSQL databases are a perfect fit when it comes to the implementation of these principles. You can read the article here.
In this article I don’t want to look at databases but rather at documents and how small businesses can draw on the FAIR data principles to develop their strategy for storing their documents ranging from accounting, commercial, trading or intellectual property documents.
In some ways the context of documents is potentially closer to the original development of these principles than the context of databases. In the original paper that introduces these principles the authors talk about the publication of digital resources and how these principles can improve the value of these digital resources not only for now but also in the future.
You might think though "Why is this for me? I don’t publish digital resources and I defenitely don’t publish digital resources for a wider audience". But this is exactly what you are doing. Documents are very much digital resources and by saving them to your disk, to the cloud or to an external storage device you effectively publish them with the notion of refering back to them in the future. And while they are not for a wider audience today you might need other people to be able to refer to these documents in the future.
Documents are not just for Christmas, they are for life
This should be the first take home message here. Document storage is not just an archival process but should be seen as an active process that has the potential reuse of your documents at the center. How often have you thought about something, remembering that you had written or read about it in one of the documents you have stored away only to be unable to find it again, feeling realy frustrated about it. Remember that frustration the next time you store a document for the first time.
The FAIR data principles in context
The FAIR data prinicples are based on the four key corner stones of findability, accessibility, interoperability and reuse. Nevertheless at the core of the whole idea is the notion that your digital resouces (read documents) are described by clear meaningful additional information – referred to as metadata.
Lets have a look at the four principles one by one
Findability
The key requirements for findability can be summarised as:
- assigment of a unique identifier
- rich metadata to describe the resource
- resources and their description are stored in an indexed and searchable manner
Taking these points into consideration there are several points to consider. Naming your documents is key so that you can be sure in the future that the document is the document you are looking for. This could be particularly important when you revise a document over time. It needs to be possible for you to identify the version that was relevant or up to date a a particular time. The approach I use here is to prefix every document with the date and the version of the document followed by a descriptive title following the convention yyyymmddvv_DocumentTitle
. An example would be
2020100301_ServiceProposalClientName
Also note that the document name does not contain any spaces. While not a problem for all modern operating systems and platforms there are still constraints around spaces in file names. Here I use a combination of a lowline and PascalCase, where the first letter of every word is capitalized.
Another feature that is your best friend are document properties. Always make sure that you use them. These allow you to organise your documents according to category, tags etc.
You should also develop a folder structure to organise your documents in a structured and understandable way.
This brings us to the last point about indexing and searchability. The great news is that all modern operationg systems index your documents including properties and even content if you set your computer up accordingly.
Having said that, in this age of onlince services and services as a software, it is more likely then not that you save your documents not locally but "in the cloud". Most cloud storage providers (actually all I use) offer quite comprehensible indexing and search functionality. Most of them also offer version control, which can make your life easier. If you have signed up for one of the Microsoft 365 business offerings you even have access to Sharepoint which provides excellent document management capabilities. Just remember some of this will be lost if you move your documents to another provider – particular features like custom properties and versioning control. Best to stick with the properties provided as part of the document directly.
Accessibility
The key factors related to accessibility are
- the protocol to access your resources is free, open and universally implementable
- the protocol allows for an authentication and authorisation procedure.
- information about your resource should be available even when it has been deleted
When it comes to accessibility any cloud solution beat the local disk. First of all access to your documents is through a webbrowser, which means the protocol to access your documents is HTML which is universal and here to stay. We therefore don’t have to worry about the physical storage of your documents. They will be available no matter how your provider changes his setup. It is also possible to access your documents from anywhere in the world irrespective of whether you have your computer with you or not.
Sharing is also possible with minimal effort without creating a plethora of document copies that you cannot control. But even if you store your data locally this should be ok as disk formats have been very stable and in most cases there is backwards compatibility. Obviously if your computer breaks your documents will be lost, something that will not happen with your online stored data. A problem might arise when it comes to storage media. You are probably out of luck if you still have floppy disks lying around or some other obscure storage media that were en vouge at some point or other (anybody remember zip disks?).
Another aspect that we need to think about is specific software file formats. You probably won’t have any problems when you stick to the main contenders (Microsoft office, Adobe etc) but be careful if you work with documents that have been generated by proprietary software or lesser know alternatives to the main software vendors. Therfore where possible use file formats that are universally available. Examples are the new docx, xlsx etc file formats for Microsoft office and PNG, JPEC or if possible SVG for graphics. When it comes to text documents don’t knock the good old simple text format – an oldy but goody. If you consider technologies such as markdown (a special text formatting standard) you can even create good looking documents all based on simple text files that are still readable even without formatting. And the best part is that you can easily convert them to HTML and view them in their final form in a webbrowser.
Interoperability
When the FAIR data principles talk about interoperability they refer to
- the use of a formal, accessible and broadly applicable language for knowledge representation
- the use of vocabularies that follow themselves FAIR principles
Interoperability is all about understanding. When you search for documents it needs to be clear from the beginning what a document is all about. Also for the search process itself it is important that the search terms are actually part of your document description or the document itself. Just keep that in mind when you add categories and tags to you documents.
It is also important that the additional information you add to your documents through properties or custom properties is well defined. If you add keywords without giving them much through, it is very likely that they won’t be very helpful in the future to retrieve your documents again. You should develop a hierarchy when it comes to keywords, with stringent categories, slightly more relaxed tags and free text descriptions. This way you will always be able to return to a document without having to go through all of them one by one.
When it comes to the physical organisation of your documents it is important that you develop a robust meaningful file and folder structure that you can navigate with ease.
Reusability
There is not much to say about reusability. In many ways reusability is what you get when you follow the other three principles.
There are some specific points though that are mentioned in the context of reusability
- data resources need to be acompanied by clear usage licenses
- data resources need to be associated with detailed information about provenance
What does that mean for your documents. First and foremost you need to ensure that documents are clearly associated with clients and customers in order to be able to establish if and when you can reuse the documents or send them to somebody. You also might have access to documents that include copyrighted information that you are not allowed to distribute. An example for this would be pdf files of articles, books that you have access to through a subscription service. In theses cases you might be able to cite the information in these documents but this means you need to know where they came from.
Final thoughts
Just to summariese, the FAIR data principles can provide you with clear guidance on how to establish good document management practices. The key take home messages are to name your documents clearly for future identification and add as much information through document properties as possible to make the process of finding relevant documents easier in the future.
Obviously you could opt for a dedicated document management solution but the truth is they also require a lot of work to establish and maintain. Nothing is for free. This effort might be too much, particular at the beginning or while your company is still small. Always remember that once you put a lot of time and effort in a particular proprietary solution it is very difficult to switch, something that might be necessary if you need to scale up or out.
Another thing you might have to consider are the use of all these apps. While they might be helpful in a particular context they provide a fragmented approach to document management that could result in document silos that are very difficult to penetrate. An example might be some typs of documents in your accounting software, some in the cloud storage offered by one of your software vendors for free, some on a free google or onedrive account and so forth. I think you get the picture. Try to keep everything togeter using a single consistent approach.
I hope this short excurse inspired you to look at the FAIR data principles in more details and use them to evaluate your document and perhaps data management stratagies.
Think FAIR think positive.
Share Your Thoughts