Git Mailing List Archive mirror
 help / color / mirror / Atom feed
* Commit graph not using minimal number of columns
@ 2023-04-25 23:39 Javier Mora
  2023-04-25 23:50 ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Javier Mora @ 2023-04-25 23:39 UTC (permalink / raw)
  To: git

This is something that annoys me a lot whenever I'm maintaining a git
repo with two branches - a main development one and a "fork" or
alteration of the main one, which periodically merges changes from the
main one while at the same time maintaining some additional feature.

When I display the commit graph with `git log --graph` or `gitk`, this
type of graph shows up really nice, with two parallel "tracks"
corresponding to the two branches with periodic merges between them.
However, if for some reason I merge an early version of the main
branch when there's already a newer commit in that branch, then the
entire graph becomes an ugly mess that uses a lot of horizontal space,
and I lose the visual representation of "two parallel tracks".
This seems to be due to git relying on commit dates to decide which
commit goes first, rather than relying solely on topological order.

Example:
```
mkdir TEST_git
cd TEST_git
git init -b main

touch A
git add A
git commit -m 'First commit'

git switch -c fork
touch Z
git add Z
git commit -m 'Introduced feature Z'

for i in B C D E F; do
    git switch main
    touch "$i"
    git add "$i"
    git commit -m "Add $i"
    git switch fork
    git merge --no-edit main
done

git log --all --decorate --oneline --graph
# ^ displays a pretty "two-track" graph

git switch main
for i in G H; do
    touch "$i"
    git add "$i"
    git commit -m "Add $i"
done
git switch fork
git merge --no-edit main~1

git log --all --decorate --oneline --graph
# ^ displays a complete mess that doesn't resemble two tracks
```

This seems to happen because commit H was created after G but before
merging G into `fork`, and can be mitigated by updating the committer
date of H so that it is newer than the merge of G into `fork`, but
that is not always an option.  It also seems to go away after merging
H into `fork`, but that won't work if H is part of a
temporary/abandoned branch I don't plan to merge yet.  And I found
that calling `git log main --all ...` somehow fixes the graph as well.
In these cases, the option `--date-order` actually seems to yield
better graphs than the default (`--topo-order`), but that option is
not ideal, since it may interleave commits from two separate branches,
whereas the default will group commits "semantically".

What do you think?  Do you agree that this sort of thing looks
annoying?  Do you think there might be a way to solve this kind of
issue by improving the graph generation algorithm, so that it
minimizes the number of columns in the graph (for example, preferring
to display merge commits as late as possible)?  Or is this actually
intentional, because the developers consider that in this sort of
situation it would be preferable to group all the merges together?

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Commit graph not using minimal number of columns
  2023-04-25 23:39 Commit graph not using minimal number of columns Javier Mora
@ 2023-04-25 23:50 ` Junio C Hamano
  2023-04-26 10:45   ` Derrick Stolee
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2023-04-25 23:50 UTC (permalink / raw)
  To: Javier Mora; +Cc: git

Javier Mora <cousteaulecommandant@gmail.com> writes:

> git log --all --decorate --oneline --graph
> # ^ displays a complete mess that doesn't resemble two tracks

"git log --all --oneline --graph --date-order"?



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Commit graph not using minimal number of columns
  2023-04-25 23:50 ` Junio C Hamano
@ 2023-04-26 10:45   ` Derrick Stolee
  2023-04-26 16:10     ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Derrick Stolee @ 2023-04-26 10:45 UTC (permalink / raw)
  To: Junio C Hamano, Javier Mora; +Cc: git

On 4/25/2023 7:50 PM, Junio C Hamano wrote:
> Javier Mora <cousteaulecommandant@gmail.com> writes:
> 
>> git log --all --decorate --oneline --graph
>> # ^ displays a complete mess that doesn't resemble two tracks

That mess is this:

*   ac6812d (HEAD -> fork) Merge branch 'main' (early part) into fork
|\  
* \   92dbe19 Merge branch 'main' into fork
|\ \  
* \ \   c372f7b Merge branch 'main' into fork
|\ \ \  
* \ \ \   48cae29 Merge branch 'main' into fork
|\ \ \ \  
* \ \ \ \   1bd2ff0 Merge branch 'main' into fork
|\ \ \ \ \  
* \ \ \ \ \   a5ab648 Merge branch 'main' into fork
|\ \ \ \ \ \  
* | | | | | | 362d201 Introduced feature Z
| | | | | | | * d46dfcc (main) Add H
| | | | | | |/  
| | | | | | * ac40305 Add G
| | | | | |/  
| | | | | * 867e17a Add F
| | | | |/  
| | | | * 23b1d2b Add E
| | | |/  
| | | * 8c36fda Add D
| | |/  
| | * cae1373 Add C
| |/  
| * 808c738 Add B
|/  
* 20b84d5 First commit

This width is necessary to avoid crossing lines _given the order
of the commits_, which is picked independently of the graph
rendering.

> "git log --all --oneline --graph --date-order"?

Here is that output, breaking topological ties with the commit
date order:

*   ac6812d (HEAD -> fork) Merge branch 'main' (early part) into fork
|\  
| | * d46dfcc (main) Add H
| |/  
| * ac40305 Add G
* | 92dbe19 Merge branch 'main' into fork
|\| 
* |   c372f7b Merge branch 'main' into fork
|\ \  
| | * 867e17a Add F
| |/  
* |   48cae29 Merge branch 'main' into fork
|\ \  
| | * 23b1d2b Add E
| |/  
* |   1bd2ff0 Merge branch 'main' into fork
|\ \  
| | * 8c36fda Add D
| |/  
* |   a5ab648 Merge branch 'main' into fork
|\ \  
| | * cae1373 Add C
| |/  
* | 362d201 Introduced feature Z
| * 808c738 Add B
|/  
* 20b84d5 First commit

But really, the key issue is that both branches are being used
as tips. This means that in --topo-order (implied by --graph),
every commit reachable from 'fork' but not 'main' must be shown
before the first commit of 'main'.

If we only use 'fork' as a starting point, then the --topo-order
is the best of all the options:

*   ac6812d (HEAD -> fork) Merge branch 'main' (early part) into fork
|\  
| * ac40305 Add G
* | 92dbe19 Merge branch 'main' into fork
|\| 
| * 867e17a Add F
* | c372f7b Merge branch 'main' into fork
|\| 
| * 23b1d2b Add E
* | 48cae29 Merge branch 'main' into fork
|\| 
| * 8c36fda Add D
* | 1bd2ff0 Merge branch 'main' into fork
|\| 
| * cae1373 Add C
* | a5ab648 Merge branch 'main' into fork
|\| 
| * 808c738 Add B
* | 362d201 Introduced feature Z
|/  
* 20b84d5 First commit

I don't think there is anything actionable to do here, as
these commit-ordering options are well-defined and should not
be altered. If there was an algorithm to modify the commit
order in such a way that minimized the graph output, that
would be interesting, but the cases it minimizes are probably
too rare to be worth the effort.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Commit graph not using minimal number of columns
  2023-04-26 10:45   ` Derrick Stolee
@ 2023-04-26 16:10     ` Junio C Hamano
  2023-04-26 17:35       ` Derrick Stolee
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2023-04-26 16:10 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Javier Mora, git

Derrick Stolee <derrickstolee@github.com> writes:

> This width is necessary to avoid crossing lines _given the order
> of the commits_, which is picked independently of the graph
> rendering.

Thanks for giving the crucial bit.

Object listing order is determined first, and then graph algorithm
works on the series of commits that comes out of the revision
walking machinery.

> I don't think there is anything actionable to do here, as
> these commit-ordering options are well-defined and should not
> be altered. If there was an algorithm to modify the commit
> order in such a way that minimized the graph output, that
> would be interesting, but the cases it minimizes are probably
> too rare to be worth the effort.

Yes, in addition to and next to "--{topo,date}-order", if somebody
can come up with a new "--graph-friendly-order", it may be an
interesting addition.

A tangent.  I do not offhand remember if --date-order works purely
on the timestamps in the commit objects, or do we take corrections
based on the generation numbers?  It seems that we only use the
compare_commits_by_gen_then_commit_date helper for prio queue
manipulation (to avoid the "slop" thing terminating the revision
walk too early) and not actual sorting.  I wonder if it makes much
difference if we used it instead of compare_commits_by_commit_date()

Thanks.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Commit graph not using minimal number of columns
  2023-04-26 16:10     ` Junio C Hamano
@ 2023-04-26 17:35       ` Derrick Stolee
  2023-04-26 17:43         ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Derrick Stolee @ 2023-04-26 17:35 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Javier Mora, git

On 4/26/2023 12:10 PM, Junio C Hamano wrote:
> Derrick Stolee <derrickstolee@github.com> writes:

>> I don't think there is anything actionable to do here, as
>> these commit-ordering options are well-defined and should not
>> be altered. If there was an algorithm to modify the commit
>> order in such a way that minimized the graph output, that
>> would be interesting, but the cases it minimizes are probably
>> too rare to be worth the effort.
> 
> Yes, in addition to and next to "--{topo,date}-order", if somebody
> can come up with a new "--graph-friendly-order", it may be an
> interesting addition.
> 
> A tangent.  I do not offhand remember if --date-order works purely
> on the timestamps in the commit objects, or do we take corrections
> based on the generation numbers?  It seems that we only use the
> compare_commits_by_gen_then_commit_date helper for prio queue
> manipulation (to avoid the "slop" thing terminating the revision
> walk too early) and not actual sorting.  I wonder if it makes much
> difference if we used it instead of compare_commits_by_commit_date()

The --date-order guarantees topological relationships are respected,
which is how it is different from the default order.

For the incremental topo-order logic, the topo_queue determines the
final order (the algorithm ensures that the indegree_queue has walked
far enough that we only add commits to topo_queue if their "in
degree" is zero and thus safe to use within topological constraints).

Here is how we pick the comparison:

	switch (revs->sort_order) {
	default: /* REV_SORT_IN_GRAPH_ORDER */
		info->topo_queue.compare = NULL;
		break;
	case REV_SORT_BY_COMMIT_DATE:
		info->topo_queue.compare = compare_commits_by_commit_date;
		break;
	case REV_SORT_BY_AUTHOR_DATE:
		init_author_date_slab(&info->author_date);
		info->topo_queue.compare = compare_commits_by_author_date;
		info->topo_queue.cb_data = &info->author_date;
		break;
	}

Using NULL makes the topo_queue act as a stack instead of a queue.

But crucially, --date-order sets to compare_commits_by_commit_date, so
generation number has nothing to do with this part of the walk (it has
everything to do with indegree_queue, though).

To adapt this algorithm to a newer, dynamic ordering that cares about
minimizing the rendered graph, I don't think changing the priority
queue comparison would be sufficient. Something deeper would be
required and would be quite messy.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Commit graph not using minimal number of columns
  2023-04-26 17:35       ` Derrick Stolee
@ 2023-04-26 17:43         ` Junio C Hamano
  2023-04-27 13:02           ` Derrick Stolee
  0 siblings, 1 reply; 8+ messages in thread
From: Junio C Hamano @ 2023-04-26 17:43 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Javier Mora, git

Derrick Stolee <derrickstolee@github.com> writes:

> To adapt this algorithm to a newer, dynamic ordering that cares about
> minimizing the rendered graph, I don't think changing the priority
> queue comparison would be sufficient. Something deeper would be
> required and would be quite messy.

Oh, I wasn't thinking about "graph-friendly" at all.  

I was wondering if we replaced --date-order implementation with
corrected commit date from the generation number work would give us
a better output order that users would expect in general (with or
without --graph).



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Commit graph not using minimal number of columns
  2023-04-26 17:43         ` Junio C Hamano
@ 2023-04-27 13:02           ` Derrick Stolee
  2023-04-27 18:24             ` Junio C Hamano
  0 siblings, 1 reply; 8+ messages in thread
From: Derrick Stolee @ 2023-04-27 13:02 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Javier Mora, git

On 4/26/2023 1:43 PM, Junio C Hamano wrote:
> Derrick Stolee <derrickstolee@github.com> writes:
> 
>> To adapt this algorithm to a newer, dynamic ordering that cares about
>> minimizing the rendered graph, I don't think changing the priority
>> queue comparison would be sufficient. Something deeper would be
>> required and would be quite messy.
> 
> Oh, I wasn't thinking about "graph-friendly" at all.  
> 
> I was wondering if we replaced --date-order implementation with
> corrected commit date from the generation number work would give us
> a better output order that users would expect in general (with or
> without --graph).

Ah. My mistake.

Corrected commit date should be equal to commit date unless there
is clock skew causing a commit to be older than its parent, (or
we have a path of commits with equal commit date) so I don't
anticipate that being helpful in general.

Thanks,
-Stolee

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Commit graph not using minimal number of columns
  2023-04-27 13:02           ` Derrick Stolee
@ 2023-04-27 18:24             ` Junio C Hamano
  0 siblings, 0 replies; 8+ messages in thread
From: Junio C Hamano @ 2023-04-27 18:24 UTC (permalink / raw)
  To: Derrick Stolee; +Cc: Javier Mora, git

Derrick Stolee <derrickstolee@github.com> writes:

> Corrected commit date should be equal to commit date unless there
> is clock skew causing a commit to be older than its parent, (or
> we have a path of commits with equal commit date) so I don't
> anticipate that being helpful in general.

Yes, exactly.  For normal cases it should not matter.

I was wondering if the current --date-order shows commits in an
order that end-users find unnatural in a branchy history with many
commits that have wrong commit dates, and if so, tweaking the
date-order to instead use the corrected commit date may help and if
so how much.

Thanks.



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-04-27 18:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-25 23:39 Commit graph not using minimal number of columns Javier Mora
2023-04-25 23:50 ` Junio C Hamano
2023-04-26 10:45   ` Derrick Stolee
2023-04-26 16:10     ` Junio C Hamano
2023-04-26 17:35       ` Derrick Stolee
2023-04-26 17:43         ` Junio C Hamano
2023-04-27 13:02           ` Derrick Stolee
2023-04-27 18:24             ` Junio C Hamano

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).