Talk:The Case for Using Wiki Version History

From Open Source Ecology
Jump to: navigation, search

Meta-Discussion

Major advantages on git - what are those advantages? What problem are we trying to solve?

Once we are clear on advantages, and problem we are solving - we have an answer.

The first question for you - what are those advantages of Git that our workflow can benefit from? What problem is a Git platform solving that is important to us, that is not addressed by the wiki versioning?

Major Advantages

  1. Support MediaWiki’s major target is Wikipedia. Wikipedia was not intended to be used as a development environment but rather as an encyclopedia. Support and for using it as a version control system is likely going to be limited.
  2. Robustness - Git is well tested as a versioning system vs not many people use MediaWiki as a versioning tool.
  3. Data Replication - With git every collaborator has a copy of the file, they could work even when the server is down. Should a problem with the server happen, every collaborator has a backup.
  4. Versions have a hash key - In Git every commit has a unique hash key it can be referred to. A version can always be linked to by its unique hash - In Wikipedia on the other hand there is only a timestamp and the current version changes retains its link after a new version is added. Old versions can be accessed directly via their unique link.
  5. Extensibility/Automation - Git servers usually have the ability to do certain tasks when a new version is uploaded. A continuous integration server can automatically check files for correctness and create preview images.
  6. Access Control - In git, a part of the developers can be maintainers that first have to check changes before they make it into the official version.
  7. Accidental Overwrites- In Wikipedia two people could submit a file at the same moment and not see that another person did the same, one of the commits would be overwritten - Git prohibits this behavior. That said, I can see that WikiMedia might just be good enough for now but I would also be open to doing a few demos on how git could be used as an alternative.

Is any one of the reasons important enough?

Counterarguments on Major Advantages

  1. 'Support - Wikipedia has a version control system, so in that sense it supports file versioning. Our requirement is keep all files from all contributors easily accessible, time stamped, and downloadable at any time without risk of data loss and without merge conflicts - while requiring only widely-accessible skill levels for team coordination. I think mediawiki does a better job at this problem statement, because you can see all the people working on the issue readily in the history. Further, our use of Visual Version History and CAD Part Libraries allows for quick visual diffs, where ready visual diffs would be much more difficult as they would be sread throughout many developers' accounts until they are committed to master. Navigating version histories is readily accessible to novices, whereas Github is pretty much for power users.
  2. Robustness - Counterargument: yes, robustness in software, but it is not robust for open hardware development. For example, it can't handle video. So in terms of robustness, Git is less robust for handling a more complex process of hardware dev, for an integrated process like OSE uses. But certainly it's more robust for software, and we should use Git for software.
  3. Data Replication - The same is effectively true for the wiki, because every collaborator has a copy of the files they are working on. I would argue that Data Replication is not possible for hardware - as for hardware, the whole project is terabytes (from videos and heavy assets such as large CAD assemblies). This is why a wiki-centric infrastructure is good. So this points to defining the problem statement clearly: Git is good for software, not for hardware. The initial qeustion is: why use wiki version history? The second question that we are asking, why use wiki for project management as opposed to git. Having specified the question, we can answer: we use wiki for overall development, we use git for software only, as a general best practice.
  4. Versions have a hash Key - wiki has the equivalent functionality of a hash key.
  5. Access control - The GVCS (creating a new infrastructure for civilization) is a broader project, therefor we allow everybody in without the need for a access control. The only question is spam, which we address by manual approval of new users.
  6. Accidental Overwrites- I'm not aware that this is an issue. If there is a submission conflict, that is clear to the submitter - their upload fails, and they upload a file again. If 100 people do this, they just try again and the upload should be reconciled in seconds. But there is never typically a large number of people working on a single file, as we normally coordinate so that if there are multiple people working, we work on different files, or break the file into smaller parts for multiple people to work on them. Further, we also encourage people to use their work log as a repo, and add to the main trunk at a later date. Accidental overwrite can be an issue for page edits - but not for file uploads. For page edits, we use embedded documents, which are crowd-collaborative-realtime-editable and never suffer from 'submit' conflicts.

Continued Argumentation

  1. Versions have a hash Key - wiki has the equivalent functionality of a hash key.
    1. I don't think that this is true. The problem with the wiki is that the latest version does not really have a hash, older versions can be linked to, but the newest version always has the same link, there is no permanent link. Or is there a way to address the current link for the future?
      1. Ok, so the hash-key functionality is actually the permalink functionality, sadly it only links to the Wikipage of the file but not to the file itself.

Discussion

I agree that given that you are using a wiki as a repository you would want to just upload a new version of a file and use it. I just think that MediaWiki was not intended to be used as a repository.

If I understand correctly the reason you gave for not using Git are

  1. You don't control the data with Gitlab
  2. Visual Version History
  3. Collaboration on a scale
  4. It is currently used for a large set of parts (this was not included originally but I think it is important)

I still think Git has major advantages and want to give counterpoints to the arguments above

1. You don't control the data with Gitlab

Q: I don't understand this, why don't you own the data, you can host it on your server.

A: Then why not just use our server? This means that we're hosting the data in 2 places, so why not just host it on our servers? And if Gitlab decides to end service, we have to have a local backup. Or if it changes its interface or usage terms, we may have to rework significant parts of our workflow. We need absolute control of our data as we are a long-term project.

2. Visual Version History

Q. I think that Visual Version History should not be a problem, you can either just continue using the wiki and then link to the git files. Even better, the visual version history could be automatically created as a commit hook and on GitLab.

A. Could be created: a visual gallery like in the wiki, but it already exists in Mediawiki so why reinvent the wheel? The wiki already has all the formatting to make this easy. And we don't want to require people understanding how to use Git platforms, which is a higher learning curve than a wiki.

3. Collaboration at a scale

Q: I think there is a case that the access privileges are much better with Git. I believe that pull requests are probably necessary on a very large scale collaboration.

A: That is specific to software workflows. In hardware workflows, authority is more distributed. The builder has ultimate authority. A software platform of Git relies on commits and commit control. A hardware platform needs to rely on 'non-commits'. I mean that every design is valid until it is tested. Because there is no uniform compiler in hardware, or it is more expensive to 'compile' or build - much work by default remains non-reconciled or 'un-committed' - as a potential viable fork. So in summary, hardware by its nature requires no access privileges - as decision-making (on commits or reconciliatiuon) is much more distributed in hardware. Ie, the guy in Africa, North Pole, Europe, and rural american town all need equal access - and whoever builds, decides. So permissions are simple: all source is open, and any build is an effective fork. So it seems that the workflow from software is the general idea, but in practice - the practices are subtly different. Specifically: you don't need access privileges - default is all open and the builder is the actual committer. You can't assign roles and privileges in hardware - it doesn't make sense.

4. It is currently used for a large set of parts

Q: I think this point is very strong and that the transition could be a lot of work. But I still think that it could be worth it, and it should be tried out.

A: Qeustion 3 implies that the problem statement is not reconciliation (commits). It's more about who puts up the effort to build. Yes, we can all collaborate on digital design and treat it like software - ie, someone takes the control of commits. Because you cannot pre-allocate the commit-control role, you cannot scale in the same way in hardware as in software.