Analysis of PLM Software Conflict Resolution

From Open Source Ecology
Jump to: navigation, search

Question

How does professional grade PLM software resolve file conflicts? And how does OSE achieve even higher performance using simple online tools? Here is an assessment of the state of art in each.

Context

Question to Yorik, core FreeCAD developer: OSE will launch a $250k Incentive Challenge on the HeroX platform on September 2020 to build an open source, pro grade 3d printed cordless drill. We expect 1000s of participants.

Regarding file conflict resolution - can you comment on what I know about this right now - link below - and what Is currently available in FreeCAD? We plan to simply use our wiki for FreeCAD file version history, with Annotated change log pictures uploaded manually to make design changes transparent - allowing thousands of people to contribute to he same design in near real-time.

See

https://wiki.opensourceecology.org/wiki/Analysis_of_PLM_Software_Conflict_Resolution#Problem_Statement

What are your thoughts?

Current Solution

Existing

  1. Break down machines according to Module Based Design, and save components as small as possible.
  2. Lock in admissible parts by merging them into official repository. Delete redundant parts. Needs a strong manager + cooperation of community via Annotated Visual History (see Incentive Challenge Judging Criteria).
  3. Create containers that load the individual components - as FreeCAD a2+ Workbench.
  4. Reduce the size of the container to only file names

Problem Statement

PLM software is not designed for mass collaboration. Typical teams in industry resolve conflict by checking out a file and locking it down. This does not work for OSE because a checked out file means that nobody else can work on it concurrently.

The ideal solution is real time collaboration. Semi-realtime collaboration can occur when a person is online-connected to a repository, and FreeCAD downloads changes from other contributors on atimeframe of every 1 minute. This is undesirable, as collaborative waste would occur: people have to negotiate conflicting changes with one another, and any incompatible change must result in a fork. This can be resolved simply by starting a fork in the first place, and doing a pull request into the main branch later. This is the mechanism that GitHub uses.

However, the GitHub mechanism of high level conflict resolution does not work for OSE. This is because an entire repository must be cloned. Because files for CAD are so much larger, file storage memory limits this approach very quickly. For example, a single 10MB file (out of say 1 Gig total project size) - when forked by 1000 contributors - takes 1 TB to clone because the whole project must be cloned. This is memory prohibitive.

Potential Solutions

Version Control for PLM in FreeCAD is a complex, but solvable problem. There are many existing complex software layers that try to solve many of the issues. Most are open source. The difficulty is adapting them and putting them together in an easy to use software package.

Ideally, CAD data would be stored in an easy to difference and version or revision control format. A FreeCAD workbench that has an understandable and somewhat automated workflow might avoid using the archive format and put the XML small binaries into a modified git protocol folder with various tweaks like binary diff enabled so commits and branches don't create data copies. The workbench may need settings to help the user control branching and keep commits and differencing reasonable. As well as manage and even cull dead-end branches when needed. The ability to see files being worked on (checked-out) by other users would help enable communication and planning about versions before any possible conflicts are created. A voting system may also help manage decisions when dealing with large groups of contributors. Changing the FreeCAD file format or not using the archive format for collaboration may also enable more fine-grained control of versioning different types of data objects in different ways similar to other proprietary PLM CAD sharing platforms.

Git LFS is not ideal because it stores large (100MB-1GB) files outside the git repo with pointers so they are not differenced or version controlled the same way.


Controlling forks and preventing copies may be better-addressed mostly server-side. Much like with branches and differential compression there is no reason (except RAID & backups) to store multiples copies of the same data. A web-based git protocol implementation (Gitlab & Github) may have internal software solutions or rely on lower software and/or hardware layers such as versioning file systems and data deduplication.

Even a modified git protocol may not be ideal for CAD collaboration, but there are other revision and version control protocols that are also OSS. The primary reason this project has yet to be undertaken is likely the sheer size of it compared to other FreeCAD workbenches and a large amount of more immediately important fixes and additions to FreeCAD, which is still early beta software.

Related

Feedback

Yorik 2

No unfortunately... The only world I know where these things are possible is text (and writing apps like libreoffice). I think these systems rely heavily on diffing to be able to figure out what two people did to a same file at a same time... Because it is concentrated. What you change in a text lives in one signle place in the file too. In images or 3D models or CAD formats, often when you change something, you are actually changing several different parts of the file.

However FreeCAD has this "transaction" system, that is used to register operations to the undo/redo stack. You could imagine a system where users are sending transactions to a kind of master queue, and that master queue gets processed one by one... That could form an interesting project.

There are other possibilities, ex https://speckle.systems/ It's a server/client system that would basically do all that for you: keep a model, allow clients to change that model in real time, and notify other connected clients of the changes. I started working on a basic implementation in FreeCAD some time ago ( https://github.com/yorikvanhavre/WebTools ) but stopped because the guy behind speckle was working so crazily fast that the API changed faster than I could code :) I should have a new look at it, it must have settled down now...

Marcin

Thanks for your thoughtful feedback, and will keep you in the loop. I think we will come up with something really good - otherwise the incentive challenge will be full of collaborative waste instead of delivering the promise of open collaboration.

Do you know of any expensive professional grade high resource solution where 100 or so people are able to work on a CAD file in realtime with changes updated in realtime on one's screen? What is the closest to this outside of the Verse example for Blender? Or do the big guys not even touch this because of technology limitations?

Yorik

Hi Marcin,

Hmm complex problem, lots of people discussing and argueing about it since long...

My two cents:

True real-time collaboration requires a complex, dedicated protocol where each and every "move" is registered. The only I know of was something developed by Eskil Steenberg for Blender (and an online game he did called "love") called Verse. It kind of worked. But there is a huge amount of data to be transmitted, so it failed quite easily. But you could really see other people push the vertices in real time...

While this is good for playing and "rough 3D sketching", it's not that interesting for more accurate design. Changes are so fast and so many that the "changes stack", or model history, becomes absurdly complex, and therefore unuseful. Also, speaking of experience, comparing working with dropbox-like solutions where each file save gets recorded, with working with git-like solutions, where commits are a decision of the user, with a meaningful message and a chosen set of file, I wouldn't hesitate one second to say the latter is far more interesting. There is an abysmal difference in the control you can have over the whole design, who did what, when and why, and just by reading the git log you get a fairly good idea of the whole design process.

I would say something like this: Reviewing 3D files together, in real-time, maybe with the ability to mark or annotate them, would be tremendously useful. Modeling and committing changes to the model, however, should be a more carefully thought and undertaken process, with better control over each "step", what each of these steps contains being something that is decided by the designer, and not by an automated system.

The way git handles binary files like FreeCAD files is indeed an issue. FreeCAD files are actually zip files, so they are handled as a binary blob by git. Meaning, on each commit the whole file is stored again. If you commit ten times a 10Mb file, your git folder stores 100Mb.

This is very common as soon as you start working with non-text files. Even if the file format is text-based instead of binary, more than often, a simple change in a 3D file (moving a piece 10mm) creates many changes spread all over the file.

Speaking strictly about FreeCAD (but it might apply to other formats too), there are however several possible paths to attack this:

1) if you unzip a FreeCAD file, 90% of its contents is text. There are many projects out there whose purpose is to unzip files when committing to git, and rezip after pulling: https://forum.freecadweb.org/viewtopic.php?f=22&t=8688&start=20 This could help reducing drammatically the size of a git repo containing FreeCAD files (if you move a piece by 10mm, only a pretty small fraction of the file will change) but it will still be hard to get a human-readable summary of what has changed, which is another big advantage of working with git.

2) I started working some time ago on a small script ( https://github.com/FreeCAD/FreeCAD/blob/master/src/Tools/fcinfo ) whose purpose is to print a "text representation" of a FreeCAD file. The idea is that you could use this script to produce human-understandable diffs between two FreeCAD files (ex: object "Piece" was at x=10mm in the old file, now it is at x=20mm). Git allows you all kind of tricks to use your own diff programs for specific filetypes.

Basically this won't solve the main problem of "what ahppens when two people commit to the same file at the same time". No software I know of is able to work out this kind of problem with binary files. Unzipping a FreeCAD file might allow to do partial merge, though. But this needs some heavy testing, an will fail in many cases. The .brep files, for example, that store the shape of each file, although they are text files, often change drastically with a very simple change in the model, and will probably not be reconciliable/mergeable.

But even so there are certainly huge progresses to be done, and I believe these zip-based compond formats like FreeCAD's (libroffice uses that system too, and nowadays, would you believe, even microsoft, docx and xlsx are also unzippable) are the best possible compromise between binary formats and text-based formats...

I'd love to know how this question is going forward on your side, please keep me informed!