All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH/RFC] autocrlf: Make it work also for un-normalized repositories
@ 2010-05-10 17:11 Finn Arne Gangstad
  2010-05-10 17:29 ` [msysGit] " Johannes Schindelin
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Finn Arne Gangstad @ 2010-05-10 17:11 UTC (permalink / raw
  To: git, msysgit; +Cc: Eyvind Bernhardsen, Junio C Hamano, Dmitry Potapov

Previously, autocrlf would only work well for normalized
repositories. Any text files that contained CRLF in the repository
would cause problems, and would be modified when handled with
core.autocrlf set.

Change autocrlf to not do any conversions to files that in the
repository already contain a CR. git with autocrlf set will never
create such a file, or change a LF only file to contain CRs, so the
(new) assumption is that if a file contains a CR, it is intentional,
and autocrlf should not change that.

The following sequence should now always be a NOP even with autocrlf
set (assuming a clean working directory):

git checkout <something>
touch *
git add -A .    (will add nothing)
git comit       (nothing to commit)

Previously this would break for any text file containing a CR

Signed-off-by: Finn Arne Gangstad <finag@pvv.org>
---

Some of you may have been folowing Eyvind's excellent thread about
trying to make end-of-line translation in git a bit smoother.

I decided to attack the problem from a different angle: Is it possible
to make autocrlf behave non-destructively for all the previous problem cases?

Stealing the problem from Eyvind's initial mail (paraphrased and
summarized a bit):

1. Setting autocrlf globally is a pain since autocrlf does not work well
   with CRLF in the repo
2. Setting it in individual repos is hard since you do it "too late"
   (the clone will get it wrong)
3. If someone checks in a file with CRLF later, you get into problems again
4. If a repository once has contained CRLF, you can't tell autocrlf
   at which commit everything is sane again
5. autocrlf does needless work if you know that all your users want
   the same EOL style.

I belive that this patch makes autocrlf a safe (and good) default
setting for Windows, and this solves problems 1-4.

I implemented it by looking for CR charactes in the index, and
aborting any conversion attempt if this is found. The code to read
the index contents was copied pretty verbatim from attr.c, and should
probably be made into a non-static function instead if there is no
better way of doing this.

Note that ALL the tests still pass unmodified. This is a bit
surprising perhaps, but think it is an indication that no one ever
intented autocrlf to do what it does to files containing CRs.


 convert.c |   45 +++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 45 insertions(+), 0 deletions(-)

diff --git a/convert.c b/convert.c
index 4f8fcb7..9d062c8 100644
--- a/convert.c
+++ b/convert.c
@@ -120,6 +120,43 @@ static void check_safe_crlf(const char *path, int action,
 	}
 }
 
+static int has_cr_in_index(const char *path)
+{
+	int pos, len;
+	unsigned long sz;
+	enum object_type type;
+	void *data;
+	int has_cr;
+	struct index_state *istate = &the_index;
+
+	len = strlen(path);
+	pos = index_name_pos(istate, path, len);
+	if (pos < 0) {
+		/*
+		 * We might be in the middle of a merge, in which
+		 * case we would read stage #2 (ours).
+		 */
+		int i;
+		for (i = -pos - 1;
+		     (pos < 0 && i < istate->cache_nr &&
+		      !strcmp(istate->cache[i]->name, path));
+		     i++)
+			if (ce_stage(istate->cache[i]) == 2)
+				pos = i;
+	}
+	if (pos < 0)
+		return 0;
+	data = read_sha1_file(istate->cache[pos]->sha1, &type, &sz);
+	if (!data || type != OBJ_BLOB) {
+		free(data);
+		return 0;
+	}
+
+	has_cr = memchr(data, '\r', sz) != NULL;
+	free(data);
+	return has_cr;
+}
+
 static int crlf_to_git(const char *path, const char *src, size_t len,
                        struct strbuf *buf, int action, enum safe_crlf checksafe)
 {
@@ -147,6 +184,10 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
 			return 0;
 	}
 
+	/* If the file in the index has any CR in it, do not convert. */
+	if (has_cr_in_index(path))
+		return 0;
+
 	check_safe_crlf(path, action, &stats, checksafe);
 
 	/* Optimization: No CR? Nothing to convert, regardless. */
@@ -202,6 +243,10 @@ static int crlf_to_worktree(const char *path, const char *src, size_t len,
 	if (stats.lf == stats.crlf)
 		return 0;
 
+	/* Are there ANY lines at all with CRLF? If so, ignore */
+	if (stats.crlf > 0)
+		return 0;
+
 	if (action == CRLF_GUESS) {
 		/* If we have any bare CR characters, we're not going to touch it */
 		if (stats.cr != stats.crlf)
-- 
1.7.1.1.g653e8

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [msysGit] [PATCH/RFC] autocrlf: Make it work also for un-normalized repositories
  2010-05-10 17:11 [PATCH/RFC] autocrlf: Make it work also for un-normalized repositories Finn Arne Gangstad
@ 2010-05-10 17:29 ` Johannes Schindelin
  2010-05-10 18:48   ` Junio C Hamano
  2010-05-11 22:28   ` Finn Arne Gangstad
  2010-05-10 19:09 ` Eyvind Bernhardsen
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 10+ messages in thread
From: Johannes Schindelin @ 2010-05-10 17:29 UTC (permalink / raw
  To: Finn Arne Gangstad
  Cc: git, msysgit, Eyvind Bernhardsen, Junio C Hamano, Dmitry Potapov

Hi Finn Arne,

this is great stuff!

On Mon, 10 May 2010, Finn Arne Gangstad wrote:

> Previously, autocrlf would only work well for normalized
> repositories. Any text files that contained CRLF in the repository
> would cause problems, and would be modified when handled with
> core.autocrlf set.
> 
> Change autocrlf to not do any conversions to files that in the
> repository already contain a CR. git with autocrlf set will never
> create such a file, or change a LF only file to contain CRs, so the
> (new) assumption is that if a file contains a CR, it is intentional,
> and autocrlf should not change that.
> 
> The following sequence should now always be a NOP even with autocrlf
> set (assuming a clean working directory):
> 
> git checkout <something>
> touch *
> git add -A .    (will add nothing)
> git comit       (nothing to commit)

s/comit/commit/

> Previously this would break for any text file containing a CR
> 
> Signed-off-by: Finn Arne Gangstad <finag@pvv.org>
> ---
> 
> Some of you may have been folowing Eyvind's excellent thread about
> trying to make end-of-line translation in git a bit smoother.
> 
> I decided to attack the problem from a different angle: Is it possible
> to make autocrlf behave non-destructively for all the previous problem cases?
> 
> Stealing the problem from Eyvind's initial mail (paraphrased and
> summarized a bit):
> 
> 1. Setting autocrlf globally is a pain since autocrlf does not work well
>    with CRLF in the repo
> 2. Setting it in individual repos is hard since you do it "too late"
>    (the clone will get it wrong)
> 3. If someone checks in a file with CRLF later, you get into problems again
> 4. If a repository once has contained CRLF, you can't tell autocrlf
>    at which commit everything is sane again
> 5. autocrlf does needless work if you know that all your users want
>    the same EOL style.
> 
> I belive that this patch makes autocrlf a safe (and good) default
> setting for Windows, and this solves problems 1-4.
> 
> I implemented it by looking for CR charactes in the index, and
> aborting any conversion attempt if this is found. The code to read
> the index contents was copied pretty verbatim from attr.c, and should
> probably be made into a non-static function instead if there is no
> better way of doing this.

One technical question, see below.

> Note that ALL the tests still pass unmodified. This is a bit
> surprising perhaps, but think it is an indication that no one ever
> intented autocrlf to do what it does to files containing CRs.

Indeed. But a test of its own would be nice, no? If you do not have time, 
I will come up with one.

BTW all this technical description after the "---" should probably go into 
the commit message.

> diff --git a/convert.c b/convert.c
> index 4f8fcb7..9d062c8 100644
> --- a/convert.c
> +++ b/convert.c
> @@ -120,6 +120,43 @@ static void check_safe_crlf(const char *path, int action,
>  	}
>  }
>  
> +static int has_cr_in_index(const char *path)
> +{
> +	int pos, len;
> +	unsigned long sz;
> +	enum object_type type;
> +	void *data;
> +	int has_cr;
> +	struct index_state *istate = &the_index;
> +
> +	len = strlen(path);
> +	pos = index_name_pos(istate, path, len);
> +	if (pos < 0) {
> +		/*
> +		 * We might be in the middle of a merge, in which
> +		 * case we would read stage #2 (ours).
> +		 */
> +		int i;
> +		for (i = -pos - 1;
> +		     (pos < 0 && i < istate->cache_nr &&
> +		      !strcmp(istate->cache[i]->name, path));
> +		     i++)
> +			if (ce_stage(istate->cache[i]) == 2)
> +				pos = i;
> +	}

I think it makes sense to assume that "ours" determines whether we should 
assume that the index has a wrong format. But if there is also a "base" 
that disagrees on CR-ness with "ours", should we not try to pick "ours"?

Ciao,
Johannes

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [msysGit] [PATCH/RFC] autocrlf: Make it work also for un-normalized repositories
  2010-05-10 17:29 ` [msysGit] " Johannes Schindelin
@ 2010-05-10 18:48   ` Junio C Hamano
  2010-05-11 22:28   ` Finn Arne Gangstad
  1 sibling, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2010-05-10 18:48 UTC (permalink / raw
  To: Johannes Schindelin
  Cc: Finn Arne Gangstad, git, msysgit, Eyvind Bernhardsen,
	Dmitry Potapov

Johannes Schindelin <Johannes.Schindelin@gmx.de> writes:

>> Note that ALL the tests still pass unmodified. This is a bit
>> surprising perhaps, but think it is an indication that no one ever
>> intented autocrlf to do what it does to files containing CRs.
>
> Indeed. But a test of its own would be nice, no? If you do not have time, 
> I will come up with one.

Thanks for a review.  This is a good stuff.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH/RFC] autocrlf: Make it work also for un-normalized repositories
  2010-05-10 17:11 [PATCH/RFC] autocrlf: Make it work also for un-normalized repositories Finn Arne Gangstad
  2010-05-10 17:29 ` [msysGit] " Johannes Schindelin
@ 2010-05-10 19:09 ` Eyvind Bernhardsen
  2010-05-10 19:43 ` Dmitry Potapov
  2010-05-10 20:30 ` Jakub Narebski
  3 siblings, 0 replies; 10+ messages in thread
From: Eyvind Bernhardsen @ 2010-05-10 19:09 UTC (permalink / raw
  To: Finn Arne Gangstad; +Cc: git, msysgit, Junio C Hamano, Dmitry Potapov

On 10. mai 2010, at 19.11, Finn Arne Gangstad wrote:

> Previously, autocrlf would only work well for normalized
> repositories. Any text files that contained CRLF in the repository
> would cause problems, and would be modified when handled with
> core.autocrlf set.
> 
> Change autocrlf to not do any conversions to files that in the
> repository already contain a CR. git with autocrlf set will never
> create such a file, or change a LF only file to contain CRs, so the
> (new) assumption is that if a file contains a CR, it is intentional,
> and autocrlf should not change that.

I'm of two minds about this: on the one hand, it appears to fix autocrlf's biggest problem (that it breaks down when the repository is not normalized), which was the main reason I started working on it in the first place.  On the other hand, it does nothing for the user interface, which was (obviously :) another big motivator.

I'll submit a cleaned-up series with optional extras in a couple of days.
-- 
Eyvind

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH/RFC] autocrlf: Make it work also for un-normalized repositories
  2010-05-10 17:11 [PATCH/RFC] autocrlf: Make it work also for un-normalized repositories Finn Arne Gangstad
  2010-05-10 17:29 ` [msysGit] " Johannes Schindelin
  2010-05-10 19:09 ` Eyvind Bernhardsen
@ 2010-05-10 19:43 ` Dmitry Potapov
  2010-05-11 16:31   ` Eyvind Bernhardsen
  2010-05-10 20:30 ` Jakub Narebski
  3 siblings, 1 reply; 10+ messages in thread
From: Dmitry Potapov @ 2010-05-10 19:43 UTC (permalink / raw
  To: Finn Arne Gangstad; +Cc: git, msysgit, Eyvind Bernhardsen, Junio C Hamano

On Mon, May 10, 2010 at 07:11:19PM +0200, Finn Arne Gangstad wrote:

> 
> Stealing the problem from Eyvind's initial mail (paraphrased and
> summarized a bit):
> 
> 1. Setting autocrlf globally is a pain since autocrlf does not work well
>    with CRLF in the repo
> 2. Setting it in individual repos is hard since you do it "too late"
>    (the clone will get it wrong)
> 3. If someone checks in a file with CRLF later, you get into problems again
> 4. If a repository once has contained CRLF, you can't tell autocrlf
>    at which commit everything is sane again
> 5. autocrlf does needless work if you know that all your users want
>    the same EOL style.
> 
> I belive that this patch makes autocrlf a safe (and good) default
> setting for Windows, and this solves problems 1-4.

It does not really solve #2, because you will have the wrong ending for
files that must be LF, such as shell scripts, and then these scripts
fail with some weird error...

> 
> I implemented it by looking for CR charactes in the index, and
> aborting any conversion attempt if this is found.

Does it have any measurable impact on the check-in when a lot of files
are committed? 

> 
> Note that ALL the tests still pass unmodified. This is a bit
> surprising perhaps, but think it is an indication that no one ever
> intented autocrlf to do what it does to files containing CRs.

Well, tests do not cover many corner cases... So, no surprise here...

> @@ -147,6 +184,10 @@ static int crlf_to_git(const char *path, const char *src, size_t len,
>                        return 0;
>        }
>
> +       /* If the file in the index has any CR in it, do not convert. */
> +       if (has_cr_in_index(path))
> +               return 0;
> +

Why do you disable crlf conversion not only for "guess" case but also
for those files that have the "crlf" attribute? Moreover, you do that
silently without even a warning to the user. IMHO, it is incompatible
change. In fact, it can seen as regression, because by specifying the
correct attribute for that file, I could fix the ending of this file.
Now, this is impossible.

>  
>  	/* Optimization: No CR? Nothing to convert, regardless. */
> @@ -202,6 +243,10 @@ static int crlf_to_worktree(const char *path, const char *src, size_t len,
>  	if (stats.lf == stats.crlf)
>  		return 0;
>  
> +	/* Are there ANY lines at all with CRLF? If so, ignore */
> +	if (stats.crlf > 0)
> +		return 0;
> +
>  	if (action == CRLF_GUESS) {
>  		/* If we have any bare CR characters, we're not going to touch it */
>  		if (stats.cr != stats.crlf)

This chunk does not make sense. Can you explain what did you try to
achieve here for guess and non-guess cases?

IMHO, we should conservative in our changes. So, to change behavior of
autocrlf only where "crlf" is not set explicitly. Or, at least, produce
some warning about discrepancy between what the user has _explicitly_
told to do and what git does. I really dislike that any tool silently
ignores users explicit instructions, because it thinks it knows better
than a human. It is plainly wrong.


Dmitry

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH/RFC] autocrlf: Make it work also for un-normalized repositories
  2010-05-10 17:11 [PATCH/RFC] autocrlf: Make it work also for un-normalized repositories Finn Arne Gangstad
                   ` (2 preceding siblings ...)
  2010-05-10 19:43 ` Dmitry Potapov
@ 2010-05-10 20:30 ` Jakub Narebski
  2010-05-10 21:17   ` Dmitry Potapov
  2010-05-11 22:52   ` Finn Arne Gangstad
  3 siblings, 2 replies; 10+ messages in thread
From: Jakub Narebski @ 2010-05-10 20:30 UTC (permalink / raw
  To: Finn Arne Gangstad
  Cc: git, msysgit, Eyvind Bernhardsen, Junio C Hamano, Dmitry Potapov

Finn Arne Gangstad <finnag@pvv.org> writes:

> Previously, autocrlf would only work well for normalized
> repositories. Any text files that contained CRLF in the repository
> would cause problems, and would be modified when handled with
> core.autocrlf set.
> 
> Change autocrlf to not do any conversions to files that in the
> repository already contain a CR. git with autocrlf set will never
> create such a file, or change a LF only file to contain CRs, so the
> (new) assumption is that if a file contains a CR, it is intentional,
> and autocrlf should not change that.
> 
> The following sequence should now always be a NOP even with autocrlf
> set (assuming a clean working directory):
> 
> git checkout <something>
> touch *
> git add -A .    (will add nothing)
> git comit       (nothing to commit)
> 
> Previously this would break for any text file containing a CR

How this feature relates to `core.safecrfl'?

-- 
Jakub Narebski
Poland
ShadeHawk on #git

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH/RFC] autocrlf: Make it work also for un-normalized repositories
  2010-05-10 20:30 ` Jakub Narebski
@ 2010-05-10 21:17   ` Dmitry Potapov
  2010-05-11 22:52   ` Finn Arne Gangstad
  1 sibling, 0 replies; 10+ messages in thread
From: Dmitry Potapov @ 2010-05-10 21:17 UTC (permalink / raw
  To: Jakub Narebski
  Cc: Finn Arne Gangstad, git, msysgit, Eyvind Bernhardsen,
	Junio C Hamano

On Mon, May 10, 2010 at 01:30:22PM -0700, Jakub Narebski wrote:
> Finn Arne Gangstad <finnag@pvv.org> writes:
> > 
> > The following sequence should now always be a NOP even with autocrlf
> > set (assuming a clean working directory):
> > 
> > git checkout <something>
> > touch *
> > git add -A .    (will add nothing)
> > git comit       (nothing to commit)
> > 
> > Previously this would break for any text file containing a CR
> 
> How this feature relates to `core.safecrfl'?

safecrlf is about making sure that you will get back exactly same file
on the next checkout as you have now in your working directory unless
you change your autocrlf value. So, the statement about breaking any
file with CR is certainly not correct.

This feature is about preserving CRLF in files inside of the repository
that were committed with CRLF despite that your current settings that
suggests that those files should be committed with LF. This makes sense
for the "guess" case, but it is clearly wrong when the user explicitly
mandated CRLF conversion for that file through attributes.


Dmitry

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH/RFC] autocrlf: Make it work also for un-normalized repositories
  2010-05-10 19:43 ` Dmitry Potapov
@ 2010-05-11 16:31   ` Eyvind Bernhardsen
  0 siblings, 0 replies; 10+ messages in thread
From: Eyvind Bernhardsen @ 2010-05-11 16:31 UTC (permalink / raw
  To: Dmitry Potapov; +Cc: Finn Arne Gangstad, git, msysgit, Junio C Hamano

On 10. mai 2010, at 21.43, Dmitry Potapov wrote:

> It does not really solve #2, because you will have the wrong ending for
> files that must be LF, such as shell scripts, and then these scripts
> fail with some weird error...

That's okay, because I'll solve that in my next iteration with "crlf=crlf" and "crlf=lf" (yes, I know).  Finn Arne's patch is about fixing the current broken behaviour when autocrlf is enabled in a repository containing CRLFs, while mine's the one you need if you want normalized text files in your repository.
-- 
Eyvind

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [msysGit] [PATCH/RFC] autocrlf: Make it work also for un-normalized repositories
  2010-05-10 17:29 ` [msysGit] " Johannes Schindelin
  2010-05-10 18:48   ` Junio C Hamano
@ 2010-05-11 22:28   ` Finn Arne Gangstad
  1 sibling, 0 replies; 10+ messages in thread
From: Finn Arne Gangstad @ 2010-05-11 22:28 UTC (permalink / raw
  To: Johannes Schindelin
  Cc: git, msysgit, Eyvind Bernhardsen, Junio C Hamano, Dmitry Potapov

On Mon, May 10, 2010 at 07:29:34PM +0200, Johannes Schindelin wrote:

> > +			if (ce_stage(istate->cache[i]) == 2)
> > +				pos = i;
> > +	}
> 
> I think it makes sense to assume that "ours" determines whether we should 
> assume that the index has a wrong format. But if there is also a "base" 
> that disagrees on CR-ness with "ours", should we not try to pick "ours"?

Did you make a typo there, or did I misunderstand something? As far as
I can tell we do pick "ours" in this case, and I think that may be the
best choice overall (it is easiest for you to fix "ours" if the
result is wrong).

- Finn Arne

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH/RFC] autocrlf: Make it work also for un-normalized repositories
  2010-05-10 20:30 ` Jakub Narebski
  2010-05-10 21:17   ` Dmitry Potapov
@ 2010-05-11 22:52   ` Finn Arne Gangstad
  1 sibling, 0 replies; 10+ messages in thread
From: Finn Arne Gangstad @ 2010-05-11 22:52 UTC (permalink / raw
  To: Jakub Narebski
  Cc: git, msysgit, Eyvind Bernhardsen, Junio C Hamano, Dmitry Potapov

On Mon, May 10, 2010 at 01:30:22PM -0700, Jakub Narebski wrote:
> Finn Arne Gangstad <finnag@pvv.org> writes:
> 
> How this feature relates to `core.safecrfl'?

safecrlf is about checking what you add, it does not concern itself
about what is already in the repository as far as I can tell.

The safecrlf check will only be done if autocrlf converts a file.  For
new files autocrlf is unchanged, and safecrlf will complain or die as
before.  For existing files, you will only be able to get autocrlf to
do anything (and thus do the safecrlf check) if the files are
normalized to LF only in the repo, or if you have set the crlf
attribute on them.

- Finn Arne

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2010-05-11 22:52 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-05-10 17:11 [PATCH/RFC] autocrlf: Make it work also for un-normalized repositories Finn Arne Gangstad
2010-05-10 17:29 ` [msysGit] " Johannes Schindelin
2010-05-10 18:48   ` Junio C Hamano
2010-05-11 22:28   ` Finn Arne Gangstad
2010-05-10 19:09 ` Eyvind Bernhardsen
2010-05-10 19:43 ` Dmitry Potapov
2010-05-11 16:31   ` Eyvind Bernhardsen
2010-05-10 20:30 ` Jakub Narebski
2010-05-10 21:17   ` Dmitry Potapov
2010-05-11 22:52   ` Finn Arne Gangstad

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.