All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* subdirectory-filter does not delete files before the directory came into existence?
@ 2010-12-14 22:21 Jan Wielemaker
  2010-12-14 23:03 ` Thomas Rast
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Wielemaker @ 2010-12-14 22:21 UTC (permalink / raw
  To: git

Hi,

There is a lot of information about extracting a directory from a git
project.  One thing I failed to find though is the following:

I try to extract a directory.  The result is fine, but there is a lot
of history in the result from *before* the directory was added to the
project.  Why?  How can I get rid of this?

If you want to see yourself, I did:

	git clone git://www.swi-prolog.org/home/pl/git/pl-devel.git
	git clone pl-devel odbc
	cd odbc
	git filter-branch --subdirectory-filter packages/odbc --prune-empty
--tag-name-filter cat -- --all
	
Now use e.g. qgit to look at the history.  As from 03/07/2002, when
the packages/odbc directory was created, all looks just fine.  Before
though ...

	Thanks for any hints

		--- Jan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: subdirectory-filter does not delete files before the directory came into existence?
  2010-12-14 22:21 subdirectory-filter does not delete files before the directory came into existence? Jan Wielemaker
@ 2010-12-14 23:03 ` Thomas Rast
  2010-12-15  9:50   ` Jan Wielemaker
                     ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Thomas Rast @ 2010-12-14 23:03 UTC (permalink / raw
  To: Jan Wielemaker; +Cc: git

Jan Wielemaker wrote:
> I try to extract a directory.  The result is fine, but there is a lot
> of history in the result from *before* the directory was added to the
> project.  Why?  How can I get rid of this?
[...]
> Now use e.g. qgit to look at the history.  As from 03/07/2002, when
> the packages/odbc directory was created, all looks just fine.  Before
> though ...

That history is not connected to the filtered one.  git-filter-branch
alerts you to it with messages like

  WARNING: Ref 'refs/tags/V5.0.4' is unchanged
  WARNING: Ref 'refs/tags/V5.0.5' is unchanged
  WARNING: Ref 'refs/tags/V5.0.6' is unchanged
  WARNING: Ref 'refs/tags/V5.0.7' is unchanged

I haven't made up my mind if this is a bug report or a feature
request, but in any case you can delete all of them and the problem
goes away.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: subdirectory-filter does not delete files before the directory came into existence?
  2010-12-14 23:03 ` Thomas Rast
@ 2010-12-15  9:50   ` Jan Wielemaker
  2010-12-15 10:40   ` Jan Wielemaker
  2010-12-15 12:22   ` Jan Wielemaker
  2 siblings, 0 replies; 8+ messages in thread
From: Jan Wielemaker @ 2010-12-15  9:50 UTC (permalink / raw
  To: Thomas Rast; +Cc: git

Dear Thomas,

On Wed, 2010-12-15 at 00:03 +0100, Thomas Rast wrote:
> Jan Wielemaker wrote:
> > I try to extract a directory.  The result is fine, but there is a lot
> > of history in the result from *before* the directory was added to the
> > project.  Why?  How can I get rid of this?
> [...]
> > Now use e.g. qgit to look at the history.  As from 03/07/2002, when
> > the packages/odbc directory was created, all looks just fine.  Before
> > though ...
> 
> That history is not connected to the filtered one.  git-filter-branch
> alerts you to it with messages like
> 
>   WARNING: Ref 'refs/tags/V5.0.4' is unchanged
>   WARNING: Ref 'refs/tags/V5.0.5' is unchanged
>   WARNING: Ref 'refs/tags/V5.0.6' is unchanged
>   WARNING: Ref 'refs/tags/V5.0.7' is unchanged

Thanks for the insight.  Catching these errors and running git tag -d on
them gets me a nice and clean history.  Only ...  It starts in
12/08/2008 instead of 03/07/2002.  This is (almost) compatible with the
filtering feedback that says it rewrote 174 commits.  The filtered and
cleaned history contains 171.

This is a bit odd.  If I open qgit on the original (before filtering)
and show the history of odbc.c, it looks like a nice and continuous
one going back to 2002.  Also

   git log --oneline packages/odbc/odbc.c

shows a history that starts with "First public version of ODBC
interface"

Of course, this is a project with a long history that was converted
from CVS, but the history looks unbroken, so why does filtering a
directory breaks it?

> I haven't made up my mind if this is a bug report or a feature
> request, but in any case you can delete all of them and the problem
> goes away.

Isn't it true that you will have info from before introducing a
directory whenever there are tags that are older than the directory?
If that is the case, it looks wrong to me.  I want to filter the 
directory, so the repository from before the existence of the 
directory is not interesting.  Of course, things change if the
directory was created by renaming files that where already in
the repository.  I don't know what one should `expect' in that
case.  Here, the directory was added from new files, so it is
quite clear what one should expect.

	Regards --- Jan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: subdirectory-filter does not delete files before the directory came into existence?
  2010-12-14 23:03 ` Thomas Rast
  2010-12-15  9:50   ` Jan Wielemaker
@ 2010-12-15 10:40   ` Jan Wielemaker
  2010-12-15 12:22   ` Jan Wielemaker
  2 siblings, 0 replies; 8+ messages in thread
From: Jan Wielemaker @ 2010-12-15 10:40 UTC (permalink / raw
  To: Thomas Rast; +Cc: git

In addition to my previous reply: Looking at the result of the
initial filter, if remove all unchanged refs I loose the history
before 2008.  Qgit however shows a broken history at the start
of the directory in 2002.  If I keep deleting the tag that is
the head of older stuff I end up with what I hoped in the first
place.  This is of course a bit tedious :-(

You can view the result at

   git://www.swi-prolog.org/home/pl/git/packages/odbc.git

I'll split some more packages.  Curious to what is going to happen ...

	Regards --- Jan

On Wed, 2010-12-15 at 00:03 +0100, Thomas Rast wrote:
> Jan Wielemaker wrote:
> > I try to extract a directory.  The result is fine, but there is a lot
> > of history in the result from *before* the directory was added to the
> > project.  Why?  How can I get rid of this?
> [...]
> > Now use e.g. qgit to look at the history.  As from 03/07/2002, when
> > the packages/odbc directory was created, all looks just fine.  Before
> > though ...
> 
> That history is not connected to the filtered one.  git-filter-branch
> alerts you to it with messages like
> 
>   WARNING: Ref 'refs/tags/V5.0.4' is unchanged
>   WARNING: Ref 'refs/tags/V5.0.5' is unchanged
>   WARNING: Ref 'refs/tags/V5.0.6' is unchanged
>   WARNING: Ref 'refs/tags/V5.0.7' is unchanged
> 
> I haven't made up my mind if this is a bug report or a feature
> request, but in any case you can delete all of them and the problem
> goes away.
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: subdirectory-filter does not delete files before the directory came into existence?
  2010-12-14 23:03 ` Thomas Rast
  2010-12-15  9:50   ` Jan Wielemaker
  2010-12-15 10:40   ` Jan Wielemaker
@ 2010-12-15 12:22   ` Jan Wielemaker
  2010-12-19  2:23     ` Thomas Rast
  2 siblings, 1 reply; 8+ messages in thread
From: Jan Wielemaker @ 2010-12-15 12:22 UTC (permalink / raw
  To: Thomas Rast; +Cc: git

The reported problems also apply to the next module.  What appears to
work is this:

  * Walk through the history, finding the commit where the directory
  is created.
  * use git tag -l --contains <commit that created dir> to get the 
  tags we want to keep.
  * get all tags, use comm and delete the tags not in the `contained'
  set above.

Not very friendly and I'm (with Thomas) about the status of these
findings.  I like to thank Thomas for giving me the right clue.

	Regards --- Jan

On Wed, 2010-12-15 at 00:03 +0100, Thomas Rast wrote:
> Jan Wielemaker wrote:
> > I try to extract a directory.  The result is fine, but there is a lot
> > of history in the result from *before* the directory was added to the
> > project.  Why?  How can I get rid of this?
> [...]
> > Now use e.g. qgit to look at the history.  As from 03/07/2002, when
> > the packages/odbc directory was created, all looks just fine.  Before
> > though ...
> 
> That history is not connected to the filtered one.  git-filter-branch
> alerts you to it with messages like
> 
>   WARNING: Ref 'refs/tags/V5.0.4' is unchanged
>   WARNING: Ref 'refs/tags/V5.0.5' is unchanged
>   WARNING: Ref 'refs/tags/V5.0.6' is unchanged
>   WARNING: Ref 'refs/tags/V5.0.7' is unchanged
> 
> I haven't made up my mind if this is a bug report or a feature
> request, but in any case you can delete all of them and the problem
> goes away.
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: subdirectory-filter does not delete files before the directory came into existence?
  2010-12-15 12:22   ` Jan Wielemaker
@ 2010-12-19  2:23     ` Thomas Rast
  2010-12-19  9:34       ` Jan Wielemaker
  0 siblings, 1 reply; 8+ messages in thread
From: Thomas Rast @ 2010-12-19  2:23 UTC (permalink / raw
  To: Jan Wielemaker; +Cc: git

Jan Wielemaker wrote:
> The reported problems also apply to the next module.  What appears to
> work is this:
> 
>   * Walk through the history, finding the commit where the directory
>   is created.
>   * use git tag -l --contains <commit that created dir> to get the 
>   tags we want to keep.
>   * get all tags, use comm and delete the tags not in the `contained'
>   set above.
> 
> Not very friendly and I'm (with Thomas) about the status of these
> findings.  I like to thank Thomas for giving me the right clue.

Now I finally remember where I knew this problem from:

  http://article.gmane.org/gmane.comp.version-control.git/91708

(My memory really sucks.)

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: subdirectory-filter does not delete files before the directory came into existence?
  2010-12-19  2:23     ` Thomas Rast
@ 2010-12-19  9:34       ` Jan Wielemaker
  2010-12-19 22:51         ` Thomas Rast
  0 siblings, 1 reply; 8+ messages in thread
From: Jan Wielemaker @ 2010-12-19  9:34 UTC (permalink / raw
  To: Thomas Rast; +Cc: git

On Sun, 2010-12-19 at 03:23 +0100, Thomas Rast wrote:
> Jan Wielemaker wrote:
> > The reported problems also apply to the next module.  What appears to
> > work is this:
> > 
> >   * Walk through the history, finding the commit where the directory
> >   is created.
> >   * use git tag -l --contains <commit that created dir> to get the 
> >   tags we want to keep.
> >   * get all tags, use comm and delete the tags not in the `contained'
> >   set above.
> > 
> > Not very friendly and I'm (with Thomas) about the status of these
> > findings.  I like to thank Thomas for giving me the right clue.
> 
> Now I finally remember where I knew this problem from:
> 
>   http://article.gmane.org/gmane.comp.version-control.git/91708
> 
> (My memory really sucks.)

Funny.  That was me having problems with filtering out directories
as well :-)  I thought your patch was added using the --prune-empty
flag.  I guess you can comment on that.  I can confirm that I've got
nice and clean filtering using

  * git filter-branch --subdirectory-filter <dir> --prune-empty
--tag-name-filter cat -- --all
  
followed by the steps above.  I use qgit with the tree-view enabled
to find the place where the hierarchy changes from the complete one
to the only-this-dir one.  You can do a binary search for that and
you spot the exact commit easily by the gap in the history-line.  Then
I run this little bit of code:

#!/bin/bash

contains="$1"

git tag | sort > tags.all
git tag -l --contains $contains | sort > tags.keep

for t in `comm -23 tags.all tags.keep`; do
  git tag -d $t
done

Not ideal, but doable.

	Cheers --- Jan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: subdirectory-filter does not delete files before the directory came into existence?
  2010-12-19  9:34       ` Jan Wielemaker
@ 2010-12-19 22:51         ` Thomas Rast
  0 siblings, 0 replies; 8+ messages in thread
From: Thomas Rast @ 2010-12-19 22:51 UTC (permalink / raw
  To: Jan Wielemaker; +Cc: git

Jan Wielemaker wrote:
> On Sun, 2010-12-19 at 03:23 +0100, Thomas Rast wrote:
> > Jan Wielemaker wrote:
> > >   * get all tags, use comm and delete the tags not in the `contained'
> > >   set above.
[...]
> >   http://article.gmane.org/gmane.comp.version-control.git/91708
[...]
> Funny.  That was me having problems with filtering out directories
> as well :-)  I thought your patch was added using the --prune-empty
> flag.  I guess you can comment on that.  I can confirm that I've got
> nice and clean filtering using

No, those two are rather different.  --prune-empty drops commits that
became "no-ops" in the sense that their tree is the same as their
(only) parent's.  In the case of --subdirectory-filter, --prune-empty
is most likely[*] redundant since the former already enables history
simplification limited to that directory.

As you can see from "TOY PATCH", my patch wasn't really meant for
application anyway.  I'm now wondering what the ramifications would
be.  filter-branch only attempts to change refs that you told it to
(listed positively on the command line), so maybe deleting anything
that was not rewritten is a sensible option (not default, mind you).


[*] Read: I think it is redundant, I'm just too lazy to double-check.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2010-12-19 22:52 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-14 22:21 subdirectory-filter does not delete files before the directory came into existence? Jan Wielemaker
2010-12-14 23:03 ` Thomas Rast
2010-12-15  9:50   ` Jan Wielemaker
2010-12-15 10:40   ` Jan Wielemaker
2010-12-15 12:22   ` Jan Wielemaker
2010-12-19  2:23     ` Thomas Rast
2010-12-19  9:34       ` Jan Wielemaker
2010-12-19 22:51         ` Thomas Rast

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.