Improving Versioning

From Open Source Ecology
Jump to: navigation, search

Read source about Semantic Versioning - https://semver.org/

Summary

  • proposal to address issues with consistent versions of machines
  • OSE's core business is actually producing documentation for machines available over a lifetime
  • Good about the current situation
    • proven technology, easy-to-use centralized wiki
    • easy to use Google Presentations cloud application for build manuals
  • Problems with the current situation
    • wiki unstructured and overwhelming
    • no real versioning in wiki and Google Presentations
    • reliance on volatile, proprietary Google cloud services
    • binary format for Google Presentations prevents proper versioning
  • Requirements
  1. no major change in current workflow
  2. keep the wiki as central point
  3. do not rely on the wiki as a central point
  4. apply proper versioning reflecting severity of change
  5. true consolidation of machines with lifetime releases
  6. versioning supporting modularity
  7. no reliance on proprietary software/cloud services
  8. support textual format as much as possible for universal versioning
  9. same ease-of-use as Google Presentations
  10. automatic index for machines, modules, projects on the wiki
  • Proposal
    • adopt semantic versioning
    • adopt Git with submodules
    • create lifetime releases with Zenodo or IPFS
    • automatically generate wiki pages, build manuals from Git repositories
    • find a way to use a textual format as an easy-to-use replacement for Google Presentations
  • Concrete steps
    • Investigate the above in the Hamburg STEAM Camp

Introduction

Several projects in OSE become viable products, for example the D3D Universal and D3D Pro are being sold as kits. However, the nature of OSE is that many projects are in a high rate of development and versions may change much in a short amount of time. For example, the D3D Universal changed considerably from the summer of 2019 to January 2020 and possibly as a result the documentation on the wiki was not consistent.

On this page we will try to address some the issues. We will first elaborate on the context, then we will discuss what is good in the current situation. We will then define the problem and finally outline a set of ideas and requirement to improve on the status quo.

Context

Many machines are reaching a state that makes them a viable product. For others to be able to use them, we need proper documentation that is consistent over versions. The machines are highly modular which means that some version of machine A uses some version of project B. If B advances, it may no longer be a fit to A. As such, it is necessary to maintain bookkeeping about which version of a different component is used.

We could ask ourselves what OSE's core business is. An acceptable answer would be producing the machines of the GVCS. However, a more refined answer would be that OSE produces documentation to build the machines of the GVCS. To be able to repair them, it is necessary to understand it, hence the documentation.

An interesting feature of the OSE machines is that they should be usable over a lifetime - potentially "indefinite". However, this would also require for the documentation to be available over a lifetime. It is not clear whether that is guaranteed currently.

Another question is then: What does it mean that the machine is available over a lifetime? Is the machine (rather its documentation) built now available in 20 years? Is the documentation of that specific version available? The machines built now are versioned using year and month. Or is only the current documentation available of the machine in the current state?

Qualities of the current situation

The current system in place has various good features. The OSE wiki has pages for each project where information and build manuals are stored. A wiki is proven method with a relatively low entrance threshold for users. The wiki logs changes to the pages and people making the changes. Major benefits are:

  • one central point of entry
  • anyone can join
  • relative ease of use
  • logs activity with changes
  • proven technology

Many of the build manuals are created in Google Presentations. This is a highly easy-to-use cloud application. Anyone can view the presentation and download it in various formats such as Microsoft Powerpoint, Libre Office, or PDF. It also supports templates to provide a fixed style and the application keeps track of versions although these versions are not available to the public as opposed to the wiki pages.

Although we list qualities here, these qualities are debatable. For example, the wiki's ease of use is relative and highly depends on the user's prior experience with editing wikis. For example, since editing and viewing the rendered results requires extra clicks, the workflow is not ideal. Embedding of external resources is brittle and requires copy-and-pasting of HTML-snippets (and the rendering of these external links depends on the browser's privacy settings). In addition, although Google Presentations are indeed easy-to-use, keeping formatting consistent can be challenging and time-consuming.

Problem definition

Although there are certainly good things to say about the current situation, unfortunately it is also possible to name deficiencies. One of the problems of the wiki is that it contains lots of information in a more or less unstructered way. Navigation is reliant on links between pages, there is machine information, firmwares, "meta" topics such as this, etc. A common complaint is that users find it hard to find their way and get overwhelmed with the content.

Although a wiki keeps track of changes, it is on a per-page basis and not really meant for versioning. (It is rather a mechanism to make and compare snap-shots.) To the best of my knowledge, it is not impossible to mark a page as being in a state that should be consolidated, for a machine release for example. Furthermore, there is no mechanism to add in-place comments (and comment on comments), which seems desirable for a collaborative workflow and resolving merge conflicts in the wiki is not intuitive.

A problem with the way machines are versioned is that the version number does not reflect how severe a change is. Does a minor change lead to a new version number, for example a new Z-sensor on a 3D-printer? Perhaps it does, it may get version 20.04, but if next month the layout of the axes change, it may get version 20.05, whereas it is a major change. And 3D-printer 20.05 may make use of an improved universal axes but which version? Do we consistently change the 3D-printer version if the universal axis version number changes? Managing these version numbers requires lots of discipline in the current situation.

The build manuals are drafted in Google Presentations but this has drawbacks. Firstly, editing requires a Google account. Secondly, the application is a proprietary cloud application from a major company that has so many resources that they can build an extremely high quality cloud application. However, this results in high degree of vendor lock-in. Google cloud services are highly volatile, see the discontinuation of Google Reader, Google +, and the number of video conferencing applications that may or may not be merged: Google Meet, Google DUO, Google Hangouts, Google chat, and all of these may or may not be integrated with Gmail. The point is that being reliant on such a large, volatile company where services may be canceled, terms of service may change is not a very "open" way of working.

It is debatable to what degree vendor lock-in is still acceptable as there are also examples without much reliance on a particular cloud service. For example, using GitHub may be acceptable if the repository is mirrored in some form. GitHub's issue tracker may be leveraged because it gives out-of-the-box, user-friendly functionality, but with the understanding that it may go away any time (e.g., because of DMCA take-down.)

More practical issues are the following: The internal format is not exposed to the user. The format is binary instead of textual which makes tracking changes more cumbersome and reliant on the used application.

Requirements

Based on the current qualities and problems we can define a set of requirements:

1) No major change in the status quo

The current way of working with the wiki has obvious benefits. Changing the way OSE works in a major way is likely to be unnecessary and a whole community has to make a big change.

2) Keep the wiki as a central point

The wiki is a proven technology with a low entrance threshold. It brings a community together in a central place. However, central points have drawbacks, certainly in an open-source project.

3) Do not rely on the wiki as a central point

The danger of a central point is that if the wiki goes offline, lots of very interesting information, a legacy, will be lost. It may be possible to protect for this.

Can we download, and backup ALL of the wiki or parts, like one can with wikipedia? Magnetic Tape Cartriges, or archival optical disks are cheap/tb and can last for a long time, two of thos physical in seperate places, and a cloud, and we should be good? (plus frequent backups).  Also any way to trigger a wayback machine "copy" ?


HintLightbulb.png Hint: The wiki is currently backed up daily, weekly, monthly, and yearly on Backblaze. Failure requires both the server with its RAID-1 backup, and the Backblaze backups, to be destroyed simultaneously. OSE is also available for off-line viewing, so there is a distributed download available from users who have done this. See more at https://www.opensourceecology.org/offline-wiki-zim-kiwix/. Currently, the entire wiki is 2.8 GB.

4) Apply proper versioning

A good approach would be to apply versioning in such a way that a version change reflects the severity of the change.

5) Lifetime releases

True consolidation of versions in the form of releases (releases of the documentation) that will never be lost and available for eternity or at least for the lifetime of the machine.

6) Versioning supporting modularity

The GVCS machines are highly modular. There should be a proper system in place that can keep track of the various version numbers of the sub-releases.

7) No reliance on proprietary cloud services

For an open project as OSE, it would be preferable not to rely on proprietary software, also not on cloud services.

8) Support textual format as much as possible

Proper versioning works best with textual formats. For versioning in binary formats one needs support from the application that produces the binary format, whereas versioning on textual formats can be done with any tool. This means that versioning becomes universal.

9) Google Presentations ease-of-use

For documentation of the machines, the build manuals, we would like to have the same ease-of-use as Google Presentations offers.

10) Automatic indexing

For the machines, the modules that are part of the machines it would be helpful if there were some form of automatic indexing. Essentially, it would be good to have more structure on the wiki.

Proposal

If one agrees that an important objective of OSE is to produce documentation for building machines of the GVCS, then we can look at other fields that deal with structured information. In that case it makes sense to look at software engineering that already has a significant open-source initiative.

Semantic versioning

Typically, semantic versioning is used in the field of software engineering. With semantic versioning, a version number has three numbers, for example 2.3.4. With these three numbers one can convey how big a change is. For example from 2.x.x to 3.0.0 is a major change, from 2.3.x to 2.4.0 is a minor change, for example changing a z-sensor on a 3D-printer. A change from 2.3.4 to 2.3.5 is a very minor change, typically to solve a problem. An example is to replace an axis-stop with a new version that works better than the previous version.

This scheme is most dominant in software engineering and it may be wise to learn from it and adopt it. This would satisfy requirement 4.

In addition to semantic versioning, which is linear, we also have to think about accommodating "alternatives/options/configurations". For example, the "same" printer has different components for power supply depending on the region (e.g., US has a US-plug and GFCI, most of Europe has a Europlug, etc.). Printer versions have gone back and forth between 2-pin/3-pin endstops, but it may make sense to have both configurations as user-selectable option (and depending on the choice a matching firmware configuration is needed).

Distributed versioning systems

Software engineering projecs typically use dedicated versioning systems. There are centralized versioning systems such as the old CVS and the still used SVN. In these systems committing a change to the software registers the change on a central CVS or SVN server. This is not ideal for open source projects because if an open source project becomes less popular and someone decides to turn off the server, all version history is gone.

Therefore, more popular today are distributed versioning systems such as Mercurial or Git. Especially Git is very popular. Distributed versioning systems work in a different way. Each user "clones" or receives a copy of the complete version history of the project. If a centralized repository is discontinued, any user can reignite the project. In addition, committing changes and registering these changes to a repository is decoupled.

An extra feature of for example git is submodules. A submodule is a Git repository inside a Git repository. The parent repository keeps track of a specific commit of a child repository. This mechanism can help us to keep track of modules, for example, version 1.0.0 of the D3D-Universal uses version 0.8.2 of the Universal Axis. Version 1.5.0 of the D3D-Universal however has changed to version 1.0.2 of the Universal Axis.

Git repositories typically have a master "branch", where a branch can be considered a chain of commits. A commit is a change in the repository made by a user, for example fixing a typo, update a version number, or big changes such as removing many files. Many people can collaborate on a project by creating their own branch, for example 'add-second-z-axis', or 'fix-z-sensor-problem'. The people working on these branches commit in these branches and at some point they are ready and want to merge their branch into the master branch. This is essentially updating the master branch to add the new feature. The merge itself is also considered a commit. For example if both 'add-second-z-axis' and 'fix-z-sensor-problem' are ready and merged into master, the team could decide that this could become a new version. They can then 'tag' the current commit with the version number, for example 'v1.3.0'.

It seems using Git would also require some kind of (more or less) formalized process/workflow that spells out who can do what changes. Git is workflow-agnostic (a mixed blessing) and there are reference workflows that could be used or adapted. (For example, FreeCAD uses the "dictator and lieutenants workflow".)

The benefit of Git is that people can collaborate in a structured way, it supports semantic versioning. The repositories are decentralized which means that losing the history of a project is unlikely. Adopting Git would satisfy requirements 6. It would help requirement 3 and 4. However, it may be a disruptive change opposing requirement 1.

Lifetime releases

There are multiple ways to satisfy requirement 5, lifetime releases. First, proper versioning (requirement 4) has to be established. One way to create releases for eternity is by using Zenodo. Zenodo is a project by CERN with funding from the European Union. It is a data repository using the infrastructure of the Large Hadron Collider. It allows you to take a snapshot of a Git repository and associate a unique DOI to it. This basically ensures that releases will be around for eternity.

Another solution is to use the InterPlanetary File System where we can store and possibly 'pin' versions for eternity. This relies on having multiple copies of the versions hosted by different nodes in the network.

Generating wiki pages

As the benefit of the wiki is clear, we need some way of bringing all of the above together. A proposal is to move serious development of the machines to a dedicated Git repository, for example 'univeral-axis' and 'd3d-universal'. The Git repositories contain a LICENSE file, a README, a CONTRIBUTING, a Changelog file, a VERSION file, and a Makefile, just as in a open-source software project. The README explains how the repository works, for what machine the repository is, etc. The CONTRIBUTING file defines a set of guidelines to aid users in contributing content. The Makefile is essentially a script that keeps track of dependencies between files and 'builds' the project, we will come back to this later. The other files are self-explanatory.

The main content of the repository consists of documentation in textual format together with the CAD files of the project. With all this content we can 'build' the project which means in this situation that we generate documents for the machines. An example would be to 'make' the build manual or 'make' the wiki page of the project, or 'make' the BOM. The make script would ideally automatically generate images from the CAD files to be used in the documents that this project generates. (This seems feasible for FreeCAD [[1]], since it can be scripted and run headless.)

As such, we can generate a wiki page and add that to the wiki ensuring that the information on the wiki remains consistent. We could also automatically create an index for the wiki for several of these project, satisfying requirement 10.

Ideally, we could even try to support to automatically incorporate editing actions on the wiki page back into the git repository to make sure that regular users can still contribute in a user-friendly way. (This can be seen as a form of round-trip engineering.)

Unfortunately, all of this, does not fully satisfy requirement 1. However, the benefits may outweigh the change. It does satisfy requirement 2 and because of the decentralized nature of Git repositories it satisfies requirement 3. Git is an extremely powerful tool but therefore also difficult to use. However, the basic required commands are easy to use and learn with proper documentation and tutorials. Therefore, the overall ease of use is diminished (requirement 9).

Textual format

To make this all work, it would be ideal if most of the content in the repository is textual since it helps keeping track of changes easily. Binary files increase the size of the repository (every new commit with a binary file increases the size of the repository with the size of the file) and it is impossible to see what change was made. Unfortunately FreeCAD files are also binary, but because it is essentially a ZIP file, we could make Git a bit smarter to allow proper versioning on these files as well.

To satisfy requirement 7, moving away from Google while maintaining the same ease-of-use (requirement 9) together with supporting a textual format (requirement 8) is an open question. One could think of using the Markdown format in combination with Pandoc to convert it to various other formats. Another option is Org mode that has excellent integration with the Emacs editor.

Mirror the OSE Wiki

To satisfy requirement 3, one may mirror the OSE wiki on separate server or an ipfs. Then, if the main OSE wiki went offline, it could be reignited.

Concrete Steps

A proposal is to investigate all of this as part of the STEAM Camp Hamburg. A good use case is to apply the above methodology to the D3D-Univeral. After this experiment, we can evaluate this methodology compared to the current workflow.

See Also

Useful Links