Looking for a way to set up Git correctly

All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed

* Looking for a way to set up Git correctly
@ 2010-11-11  3:25 Dennis
  2010-11-11  9:38 ` Alex Riesen
  2010-11-11 13:25 ` Enrico Weigelt
  0 siblings, 2 replies; 5+ messages in thread
From: Dennis @ 2010-11-11  3:25 UTC (permalink / raw
  To: git

I have a situation.

I have started a web project (call it branch1), and have maintained it 
without a version control system for quite some time.
Then, I copied it to another folder (branch2) and while the project remained 
essentially the same, I have changed a few of internal paths and some 
variable names inside the files.
Then, a few months later on, I copied branch2 to a folder called branch3 and 
also modified some of the variable names and some of the internal structure 
of the files.

Thus I ended up with 3 folders on my local HDD with pretty much the same 
file names and folder structure and everything, and most of the file 
content, except those small deltas that made those files different for each 
branch.

I guess it's never too late, and now I want to put these 3 projects into a 
version control system, and I chose git.

Now, this can be either really simple or really complicated.  My first 
question is:  how do I set the repository up in the proper way where I could 
work on all 3 projects separately, with additional possibility of working on 
branch1 only and later committing my changes to branch2 and branch3.  (Since 
projects are virtually identical, a fix in one branch usually needs to be 
propagated to other branches)
First, I assume I will use a single repository for this.  Then, do I simply 
set up 3 branches and start using them, or is there a way to set git up to 
capitalize on the projects being nearly identical?

My second question is that each branch has a huge folder with image data. 
By huge I mean 1 to 4Gb, depending on the branch.  Since images are not 
directly relevant to the development work, is there a way to not include 
those folders in git?  To be honest though, I probably should include them, 
but I wanted to ask about this separately as git repository may be get 
large, since all 3 branches may grow to 9Gb or so.

Thus I am looking for a git way to handle my situation.  Is this simple or 
is is hard?
Are there any recommendations before I jump in?
Dennis 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Looking for a way to set up Git correctly
  2010-11-11  3:25 Looking for a way to set up Git correctly Dennis
@ 2010-11-11  9:38 ` Alex Riesen
  2010-11-11 13:25 ` Enrico Weigelt
  1 sibling, 0 replies; 5+ messages in thread
From: Alex Riesen @ 2010-11-11  9:38 UTC (permalink / raw
  To: Dennis; +Cc: git

On Thu, Nov 11, 2010 at 04:25, Dennis <denny@dennymagicsite.com> wrote:
> I have started a web project (call it branch1), and have maintained it
> without a version control system for quite some time.
> Then, I copied it to another folder (branch2) and while the project remained
> essentially the same, I have changed a few of internal paths and some
> variable names inside the files.
> Then, a few months later on, I copied branch2 to a folder called branch3 and
> also modified some of the variable names and some of the internal structure
> of the files.
>
> Thus I ended up with 3 folders on my local HDD with pretty much the same
> file names and folder structure and everything, and most of the file
> content, except those small deltas that made those files different for each
> branch.
>
> I guess it's never too late, and now I want to put these 3 projects into a
> version control system, and I chose git.
>
> Now, this can be either really simple or really complicated.  My first
> question is:  how do I set the repository up in the proper way where I could
> work on all 3 projects separately, with additional possibility of working on
> branch1 only and later committing my changes to branch2 and branch3.  (Since
> projects are virtually identical, a fix in one branch usually needs to be
> propagated to other branches)
> First, I assume I will use a single repository for this.  Then, do I simply
> set up 3 branches and start using them, or is there a way to set git up to
> capitalize on the projects being nearly identical?

Assuming I've got the relationships of your "branches" right:

$ cp -a branch1 branch && cd branch
$ git init
$ echo /huge-images/ >.gitignore
$ git add .gitignore; git add .; git commit; git branch branch1
$ git checkout -b branch2
$ cp -a ../branch2 .
$ git add .; git commit
$ git checkout -b branch3
$ cp -a ../branch3 .
$ git add .; git commit

> My second question is that each branch has a huge folder with image data. By
> huge I mean 1 to 4Gb, depending on the branch.  Since images are not
> directly relevant to the development work, is there a way to not include
> those folders in git?  To be honest though, I probably should include them,
> but I wanted to ask about this separately as git repository may be get
> large, since all 3 branches may grow to 9Gb or so.
>
> Thus I am looking for a git way to handle my situation.  Is this simple or
> is is hard?

If you add the images you will eventually run into problems (heavy
swapping, for one).
Git is not really setup to work with big binary files (a file must fit into
memory completely).

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Looking for a way to set up Git correctly
  2010-11-11  3:25 Looking for a way to set up Git correctly Dennis
  2010-11-11  9:38 ` Alex Riesen
@ 2010-11-11 13:25 ` Enrico Weigelt
  2010-11-11 16:46   ` Jonathan Nieder
  1 sibling, 1 reply; 5+ messages in thread
From: Enrico Weigelt @ 2010-11-11 13:25 UTC (permalink / raw
  To: git

* Dennis <denny@dennymagicsite.com> wrote:

Hi,

> Now, this can be either really simple or really complicated.  My first 
> question is:  how do I set the repository up in the proper way where I 
> could work on all 3 projects separately, with additional possibility of 
> working on branch1 only and later committing my changes to branch2 and 
> branch3.  

As first step you could create 3 separate git repos in each directory
and add everything to it (git init, git add -A, git commit). Then 
rename the branches properly (so instead of "master", they'll be called
"branch1", "branch2", "branch2" or something like that). Create another
(maybe bare) repo elsewhere, add it as remote to the three other ones
and push their branches upwards. Now you have 4 repos, 3 for working
on the individual branches and another for collecting them all (hub model).
You could also choose to throw the first three away and only work in
the last one.

> (Since projects are virtually identical, a fix in one branch 
> usually needs to be propagated to other branches)

In your case, cherry-pick might be the right for you.
You could also do a little bit refactoring, making a 4th branch which
the other 3 are then rebased onto. Then you could do your fixes in that
branch and merged into or rebase the other 3 onto that one.

> My second question is that each branch has a huge folder with image data. 
> By huge I mean 1 to 4Gb, depending on the branch.  Since images are not 
> directly relevant to the development work, is there a way to not include 
> those folders in git?

see .gitignore file.
nevertheless it might be useful to also have all the images in the
repo for backup reasons.

BTW: if you're concerned about disk space, you could add the object dir
of the 4th (hub) repository to the 3 working repos (run git-gc in the
hub repo before that!). Next gc runs will remove the objects that are
already present in the hub. But beware! If you remove something in the
hub repo and run git-gc there, you could loose objects in the other repos!
(maybe it would be wise to add the 3 working repos as remotes in the
hub and always run an git remote update before git-gc in the hub).

cu
-- 
----------------------------------------------------------------------
 Enrico Weigelt, metux IT service -- http://www.metux.de/

 phone:  +49 36207 519931  email: weigelt@metux.de
 mobile: +49 151 27565287  icq:   210169427         skype: nekrad666
----------------------------------------------------------------------
 Embedded-Linux / Portierung / Opensource-QM / Verteilte Systeme
----------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Looking for a way to set up Git correctly
  2010-11-11 13:25 ` Enrico Weigelt
@ 2010-11-11 16:46   ` Jonathan Nieder
       [not found]     ` <20101111190724.00vcimqm8w0cw8s0@dennymagicsite.com>
  0 siblings, 1 reply; 5+ messages in thread
From: Jonathan Nieder @ 2010-11-11 16:46 UTC (permalink / raw
  To: Dennis; +Cc: git, Alex Riesen, Enrico Weigelt

(+cc: Dennis again, Alex)

Hi,

Enrico Weigelt wrote:
> * Dennis <denny@dennymagicsite.com> wrote:

>> Now, this can be either really simple or really complicated.  My first 
>> question is:  how do I set the repository up in the proper way where I 
>> could work on all 3 projects separately, with additional possibility of 
>> working on branch1 only and later committing my changes to branch2 and 
>> branch3.  
>
> As first step you could create 3 separate git repos in each directory
> and add everything to it (git init, git add -A, git commit). Then 
> rename the branches properly (so instead of "master", they'll be called
> "branch1", "branch2", "branch2" or something like that). Create another
> (maybe bare) repo elsewhere, add it as remote to the three other ones
> and push their branches upwards.

So this looks like so:

	for i in project1 project2 project3
	do
		(
			cd "$i"
			git init
			git add .
			git commit
		)
	done
	git init main
	cd main
	for i in project1 project2 project3
	do
		git fetch ../$i master:$i
	done
	mv project1 project2 project3 away/

If you would like multiple worktrees (one for each branch, maybe) for
the main repo, you might want to look into the new-workdir script in
contrib/workdir (but do consider the caveats[1]).

>> (Since projects are virtually identical, a fix in one branch 
>> usually needs to be propagated to other branches)
>
> In your case, cherry-pick might be the right for you.

e.g., when project3 gets a new fix:

	git checkout project1
	git cherry-pick project3

> You could also do a little bit refactoring, making a 4th branch which
> the other 3 are then rebased onto.

Right, what is the actual relationship between these projects?  Do
they actually represent branches in the history of a single project?

Suppose project1 is historically an ancestor to project2, project3,
and project4, which are independent.  (Maybe project1 is the initial
version and projects 2,3,4 are ports to other platforms.)  You could
take this into account when initially setting up the branches, like
this:

	git init main
	cd main
	GIT_DIR=$(pwd)/.git; export GIT_DIR
	GIT_WORK_TREE=../project1 git add .
	GIT_WORK_TREE=../project1 git commit
	git branch -m project1
	for i in project2 project3 project4
	do
		git checkout -b $i project1
		GIT_WORK_TREE=../$i git add -A
		GIT_WORK_TREE=../$i git commit
	done

(and use gitk --all when done to make sure everything looks right)

Alternatively, you can rearrange the history afterwards:

	$ git cat-file commit project2 | tee project2
	tree 76db51024713f6ef191928a8445d48d39ab55434
	author Junio C Hamano <gitster@pobox.com> 1289324716 -0800
	committer Junio C Hamano <gitster@pobox.com> 1289324716 -0800

	project2: an excellent project
	$ git rev-parse project1
	$ vi project2
	... add a "parent <object id>" line
	    after the tree line,
	    where <object id> is the full object name rev-parse printed ...
	$ git hash-object -t commit -w project2
	$ git branch -f branch2 <the object name hash-object prints>
	... repeat for project3 and project4 ...
	$ gitk --all;		# to make sure everything looks right

This is less convenient than it ought to be.  It would be nice to add
a "git graft" command to automate this procedure, which

 - interacts well with "git replace"
 - doesn't interact poorly with "git fetch" like .git/info/grafts does
 - could be more convenient to use than .git/info/grafts.

As the gitworkflows man page mentions, if you make your fixes on the
oldest branch they apply to (project1) and then merge to all later
branches, then the fixes will propagate forward correctly.  See the
"Graduation" and "Merging upwards" sections of gitworkflows for details.

>> My second question is that each branch has a huge folder with image data. 
>> By huge I mean 1 to 4Gb, depending on the branch.  Since images are not 
>> directly relevant to the development work, is there a way to not include 
>> those folders in git?

I would suggest tracking a symlink to another repository (or to a
directory tracked through other means, like unison).

Hope that helps,
Jonathan

[1] If you have two worktrees for the same project with the
same branch checked out at a given moment, the results can be
confusing (changes made in one worktree will look like they have
been commited and undone in the other).

The "detached HEAD" feature (which git-checkout.1 explains) and
multiple worktrees do not interact so well: the need to preserve
commits while no branch was checked out in one worktree will not be
taken into account when "git gc" runs (explicitly or implicitly!) on
the other.  This can be very disconcerting.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Looking for a way to set up Git correctly
       [not found]     ` <20101111190724.00vcimqm8w0cw8s0@dennymagicsite.com>
@ 2010-11-11 19:38       ` Jonathan Nieder
  0 siblings, 0 replies; 5+ messages in thread
From: Jonathan Nieder @ 2010-11-11 19:38 UTC (permalink / raw
  To: denny; +Cc: git, Alex Riesen, Enrico Weigelt

denny@dennymagicsite.com wrote:

> I am still looking through your replies and getting familiar with
> git commands.

By the way, please ignore that GIT_WORK_TREE stuff I did.  It
probably works, but it's ugly. :)  That example could have been
written better as

	git init everything
	GIT_DIR=$(pwd)/everything/.git; export GIT_DIR
	(
		cd common-ancestor
		git add -A
		git commit
		git branch -m ancestor
	)
	(
		cd project1
		git checkout -b project1 ancestor
		git add -A
		git commit
	)
	... etc ..
	unset GIT_DIR

	cd everything
	git checkout project1

[...]
> From a developer's point of view, working on projectX means making
> some changes and committing them to the repo for that project.  The
> developer may not be aware of other pojects existing.

For concreteness, I am imagining these directories represent various
versions of the Almquist shell.  The common ancestor is the BSD4.3/Net-2
version and various projects may have built from there in different
directions: NetBSD sh, FreeBSD sh, dash.  (Yes, I am oversimplifying. :))

Now suppose they have diverged so wildly that it is never possible to
synchronize code with each other.  Instead, they can copy fixes, and
this is especially convenient when the fixes are phrased as diffs to
the common ancestor.

To facilitate this, Alice revives the BSD4.3/Net-2 sh project with a
"fixes only" policy.  Her daily work might look like this:

 $ git fetch netbsd
 $ git log netbsd/for-alice@{1}..netbsd/for-alice; # any good patches today?
 $ git cherry-pick -s 67fd89980; # a good patch.
 ... quick test ...
 $ git cherry-pick -s 897ac8; # another good patch.
 ... quick test ...
 ...
 $ git fetch freebsd
 ... and similarly for the rest of the patch submitters ...
 $ git am emailed-patch

Then to more thoroughly test the result:

 $ git checkout -b throwaway;	# new throw-away branch.[1]
 $ git merge netbsd/master;	# will the changes work for netbsd?
 ... thorough test ...
 $ git reset --keep master
 $ git merge freebsd/master;	# how about freebsd?
 ... etc ...

And finally she pushes the changes out.

> Without knowing anything about git for a moment, one ideal workflow
> is where a developer makes changes to projectX that touch the base
> and projectX specific features.  Then the developer commits them and
> pushes them to the main repo.  The main repo contains all projects.
> During the commit, chages to the base automagically get pushed to
> all projects that share that base

If it is a matter of what files are touched, then maybe the base is
actually something like a library, which should be managed as a
separate project.  See the "git submodule" manual if you would like to
try something like this but still keep the projects coupled.

On the other hand, remaining in the situation from before:

Suppose Sam is the NetBSD sh maintainer.  The first step in working on
a new release might be

 $ git fetch ancestor
 $ git log -p HEAD..FETCH_HEAD;	# fixes look okay?
 $ git pull ancestor

since Alice tends to include only safe, well tested fixes.

Many changes Sam makes are specific to his project, but today he comes
up with a fix that might be useful for other ash descendants.

So instead of commiting directly, he can try:

 $ git checkout for-alice;	# carry the fix to the for-alice branch
 ... test ...
 $ git commit -a;		# commit it.

If it is not an urgent fix, at this point he might do

 $ git checkout master;		# back to the main NetBSD branch, without the fix

and give the other projects some time to work on the patch and come up
with a better fix.  Or he might cherry-pick the commit from for-alice,
and even publish it and encourage others to cherry-pick directly from
him to get the fix out ASAP.

Notice that not all changes to the base files are necessarily useful
for other descendants of the ancestral program.  So in this example,
propagation of changes between projects is fairly explicit.

[1] "git checkout HEAD^0" would be more convenient.
See DETACHED HEAD in the git checkout manual if interested.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-11-11 19:39 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-11-11  3:25 Looking for a way to set up Git correctly Dennis
2010-11-11  9:38 ` Alex Riesen
2010-11-11 13:25 ` Enrico Weigelt
2010-11-11 16:46   ` Jonathan Nieder
     [not found]     ` <20101111190724.00vcimqm8w0cw8s0@dennymagicsite.com>
2010-11-11 19:38       ` Jonathan Nieder

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.