[oe] Git Migration Status

Wed Jul 30 16:14:47 UTC 2008

I'm going to skip the rest of your argument, I don't agree the other
bits but this bit did does lead to something important: 

On Thu, 2008-07-31 at 00:42 +0930, Rod Whitby wrote:
> Richard Purdie wrote:
> > 2. The kernel uses a pull model for development and people check IDs for
> > some sanity before pulling. With the push model we're going for we don't
> > have that luxury.
> 
> That is true.  However, you also need to account for a multi-level push 
> model, where the second and subsequent levels may not have the same 
> restrictions on IDs that you are proposing for the master OE repository. 
>   In this case, the author and committer will really be two different 
> people, and you want to have real contact email addresses for both.

This isn't a problem as you describe as nowhere did I suggest making
committer == author or require any changes to author. It does bring
something else slightly related to mind though.

Say I have some tree outside OE and I commit things to is as a developer
with no access to OE directly. "We" then want to merge that into OE.
Even assuming the commit hooks allow this (there are ways and means) we
immediately lose our namespace protection. There are 101 other scenarios
that would allow this to happen and they show that namespace protection
is pointless with git.

Note that your AUTHORS file would also break with this approach unless
at merge time you grep the external tree for all email addresses and
then add them which would be tedious at best.

I think we've lost sight of the original problems so lets go back to
them:

The points I had in mind with this was to:

a) stop insane IDs entering the repository (ich at 1.2.3.4)
b) gain something we could feed into the final mtn -> git and bkcvs -> 
   git conversions which would make more sense of the data. 
   "crofton at openembedded.org" really isn't a useful identifier.
c) ensure we have accurate records about our committers on the wiki
d) remove problems trying to associate related commit IDs e.g.  when 
   someone changes email address frequently and solve that problem once 
   and for all.

We can prove d) is impossible above so lets forget it. c) below partly
covers it though.

For b) I'd propose making a list of the monotone IDs we have and asking
for opinions of what to do with them. I'm happy for
"rpurdie at openembedded.org" to become "Richard Purdie <rpurdie at rpsys.net>
for example and that will increase the usefulness of the metadata (some
username mappings are much harder to guess). I already have a list of
bkcvs ids to monotone IDs so this just leaves the the monotone -> git
mappings. Is anyone willing to volunteer to collect a list?

For c), we should make updating the wiki part of the procedure for
having an ssh key added for access to the git server. Whilst some people
have protested in the past I'd like to see people's email addresses
listed there including any aliases they commit under. They're going to
appear on the web representations of git anyway. This means if an
address is dead we have an idea of what the live one might be. We can
also probably collect this information from the wiki in a script about
defunct addresses and have gitweb and cgit use it when displaying the
IDs if I remember rightly.

For a) I guess the best we can do is add some checking for insane commit
and author IDs like containing "localhost", not setting a name or
containing an IP address. I'd also like to add something to the commit
policy about only committing using sane IDs, commits with invalid IDs
are at risk of being reverted.

How does that sound?

Richard