I appreciate that on a site like GitHub which is massive and complex, it's not as simple as just writing some code to garbage collect commits and drop them. It could have far reaching consequences.
However, GitHub have had so many years to correct this behaviour. This is something they should have been working towards from the moment it was clear this is risky behaviour.
I don't really understand what's going on over there at GitHub, everything besides their AI stuff feels like it's being ran by a skeleton crew with only enough capacity to keep things running.
"If you’re faced with the tradeoff between security and another priority, your answer is clear: Do security" -- what happened to that?
I really don't see how GitHub can justify allowing public access to dangling commits.
Surely they have a whole army of paying customers demanding proper data deletion ability (required for all kinds of legal reasons - eg. We accidentally committed code we don't have a licence for, or PII of Europeans, etc).
A simple rule saying 'any commit which ever has a refcount of zero will become forever inaccessible unless reuploaded' would do the trick.
I'm not sure... What if commits erased through forced pushes retain their ancestry? For example, branchA has commit objects A1->A2 while branchB has the same commit objects A2->A1, and then both branches are force-pushed to erase A1 and A2. Would this suffice?
I appreciate that on a site like GitHub which is massive and complex, it's not as simple as just writing some code to garbage collect commits and drop them. It could have far reaching consequences.
However, GitHub have had so many years to correct this behaviour. This is something they should have been working towards from the moment it was clear this is risky behaviour.
I don't really understand what's going on over there at GitHub, everything besides their AI stuff feels like it's being ran by a skeleton crew with only enough capacity to keep things running.
"If you’re faced with the tradeoff between security and another priority, your answer is clear: Do security" -- what happened to that?
I really don't see how GitHub can justify allowing public access to dangling commits.
Surely they have a whole army of paying customers demanding proper data deletion ability (required for all kinds of legal reasons - eg. We accidentally committed code we don't have a licence for, or PII of Europeans, etc).
A simple rule saying 'any commit which ever has a refcount of zero will become forever inaccessible unless reuploaded' would do the trick.
I believe git gc after force-push will remedy the situation by deleting all unreachable objects, even circular references.
Locally yes, but not on GitHub.
How would you ever end up with a circular reference?
I'm not sure... What if commits erased through forced pushes retain their ancestry? For example, branchA has commit objects A1->A2 while branchB has the same commit objects A2->A1, and then both branches are force-pushed to erase A1 and A2. Would this suffice?