How quickly will a repository become 'too big' due to LV binary files?

kristen · ‎07-09-2014

Hello Folks,

I am considering using a DCVS tool and have been researching Git mostly and also Mercurial. I am still very new to understanding the DCVS concepts and haven't started using them with LabVIEW yet, so please bear with me.

For either program, I am concerned about the entire file version history being stored locally and getting too big because LabVIEW files are binary. There are lots of discussions online about potential issues with storing large binary files (like an image or a dataset), but not a lot of information about progression of a typical LabVIEW project and how soon the entire set of files gets 'too big'.

Are you finding that the entire file history of a LabVIEW project does become cumbersomely large on disk due to the binary file size? Is it best to use for smaller projects or to always separate work into smaller projects? How would shared use of a reuse library be incorporated if projects should be kept small/separate? Any tips/experiences would be appreciated!

Thanks,

Kristen

Matthew_Kelton · ‎07-09-2014

Kristen,

I have no issues using Git with respect to the repository storage. Frankly, hard drive space is cheap and this should not be a reason to not use a VCS. These discussions online need to be put into perspective as well. A lot of these discussions are for much older versions of the VCS's.

As you know, LabVIEW uses binary files, and there is no way around this. With Subversion, you know your local repo will take up the same amount of space (roughly) as your code. That will never change. You will have to be constantly connected to your svn server, though. For me, that's a problem.

I am out in the field doing work, and don't want my vcs to prevent me from using scc, so a DVCS makes sense. I gladly trade whatever hard drive space I sacrifice. I picked a random repo and it was 3x bigger than my code size. But, it also has lots of branches in it.

With the DVCS, you do not need to have all branches on your local PC, you can just have the current ones you are working with if HD space becomes an issue. Your sever can still have all of them, though.

Make sure all your code has the option to separate compiled code checked. This will prevent recompiles of unchanged code from being recommitted to your repo.

In git, you can use submodules to keep separate repos for different sets of code. This is what I have done for my reuse libraries. Each library has its own repo, and then there is a master repo referencing the others as submodules. This still requires a bit more work when originally cloning the repo, but it keeps history cleaner, and I can deprecate a set of code pretty easily by simply removing the submodule. The repo for the submodule still exists and I can get to it whenever I wish, but the main repo of reuse code has dropped it (and I don't have to serach through its history for the old files if I ever need them again).

Make sure you don't put your built EXE and installers in your source repo. If you want to track them in a repo, create a different repo just for them, as that will stay large (although incremental commits are really not too big). I accidentally committed a 700MB installation package to a repo and pushed it to the network. It then took an hour to do a push or pull from the network (internet). I had to do some lower level work on the repo to get the commit removed.

I don't think you'll regret git or Hg (during my testing, Hg was SLOW). If you have never used scc before, it will change your world, and you won't know how you ever got along without it. If you are coming from svn or another centralized system, and am away like I am, you won't know how you got along without it.

Honestly the biggest issue is going to be resolving conflicts and doing merges or compares.

kristen · ‎07-10-2014

Thanks Matthew!

This is very helpful. I have used CVCS for a while and am starting to get my head wrapped around the DVCS model.

I am thinking I will use TortoiseGit and setup the LabVIEW Merge and Diff tools through there. I have typically avoided merging in the past, but it seems like Git can make that more straightforward to do.

Matthew_Kelton · ‎07-10-2014

Kristen,

I use Tortoise it and find it very useful. My biggest daily complaint about it is that the icon overlays tend to get "lost" and are out of date. The client still knows what is going on if I commit, but sometimes the icons do not represent exactly what is going on. When I was using bar, the Tortoise Bar client had a refresh option which would force the client to check the file status. I have not seen that in the git client.

Matthew

FabiolaDelaCueva · ‎07-10-2014

I agree with Matthew that once you have been using DVCS you will never want to go back

The way I manage reusable libraries is through VIPM. I have a separate repository for each source code of a reusable library and then build a package. The projects that use that reusable code have a VI Package Manager configuration file (.vipc) that specifies what version of the package is used for each project. This way when I am moving in between projects that use a different version of the package, I just have to apply the configuration and LabVIEW had the correct version for me in vi.lib or user.lib.

As Matthew said, it is important to separate source code from compiled code. Make sure everyone in your team has their LabVIEW configured this way and configure your project to mark all existing code as separated source code from compiled code. I was on a project once, where all the new VIs added by one of the team members kept coming without this separation and he sweared he had configured his LabVIEW to separate source code... alas, he had not

Regards,

Fab

For an opportunity to learn from experienced developers / entrepeneurs (Steve, Joerg, and Brian amongst them):
Check out DSH Pragmatic Software Development Workshop!

DQMH Lead Architect * DQMH Trusted Advisor * Certified LabVIEW Architect * Certified LabVIEW Embedded Developer * Certified Professional Instructor * LabVIEW Champion * Code Janitor

Have you been nice to future you?

Jeremy_Marquis · ‎07-10-2014

Fab's post sounds oh, so familiar.

FWIW, we have been using SourceTree and I think its pretty slick.

Back to your original question, as long as you keep just your source code in your repo (not built exe's and especially no installers), then you would be hard pressed to get a repo that reached 2 GB in size. Documentation is personal choice, if I generate it then it goes in the repo--I think its worth the space for the versioning/sharing.

Jeremy_Marquis · ‎07-10-2014

One more thing, for me the difficulty in leaping from CVCS to DVCS was all in the workflow. CVCS is pretty straightforward, but the fact that DVCS workflows are so flexible left me wondering how to use it. Atlassian (who make the free SourceTree client) have a simple but intuitive intro to it on their website.

kristen · ‎07-11-2014

I really appreciate all of these tips!

I have definitely been studying the workflows as I try to understand DVCS. I really like the GitFlow and am seeing comments that this can be nicely incorporated with SourceTree.

Does SourceTree provide everything you need to hook into Git and your repositories? Were you able to configure the LV Diff and Merge tools and only use this interface for code check in/out/merge/branch/etc?

Matthew_Kelton · ‎07-11-2014

I have not tried SourceTree since the initial release. It was so dreadfully slow I uninstalled it 20 minutes after installing it. I suggest trying a couple of different clients to see which you prefer to use.

Jeremy_Marquis · ‎07-13-2014

I have not gotten any git client to successfully use LabVIEW's diff or merge tools. But then again, how many times in the last 10 years have I used those tools? Maybe once.

For reference, I have one project with 4 developers that have been using Git and Sourcetree for the code base for almost a year. The only times we have had Git merge conflicts were user error (not following workflow). So I find I haven't needed LabVIEW's diff/merge tools, just like before we were branching and merging.

The key is to not have people working in the same VI's at the same time. Separating compiled from source code is required for this to work. Your application architecture should also be modular to enable this division of work.

The only real gotcha I have found is the xml-based files: the project file and class libraries. Class libraries you can just have different people responsible for "their" libraries. But the project file, everyone touches that. Every time you code you are modifying that file, and it is a real problem. The workaround we are using for that is we usually don't commit our project file, just one guy is the one that edits it. If you added a bunch of classes, etc., you send your lvproj file to that one unhappy guy that has to be master of the project file.

BTW, to keep your class library files from being corrupted by Git's text merges, you can set your client to treat them as binary files, just like VI's.

Git User Group

How quickly will a repository become 'too big' due to LV binary files?

How quickly will a repository become 'too big' due to LV binary files?

Re: How quickly will a repository become 'too big' due to LV binary files?

Re: How quickly will a repository become 'too big' due to LV binary files?

Re: How quickly will a repository become 'too big' due to LV binary files?

Re: How quickly will a repository become 'too big' due to LV binary files?

Re: How quickly will a repository become 'too big' due to LV binary files?

Re: How quickly will a repository become 'too big' due to LV binary files?

Re: How quickly will a repository become 'too big' due to LV binary files?

Re: How quickly will a repository become 'too big' due to LV binary files?

Re: How quickly will a repository become 'too big' due to LV binary files?