mirror of
https://github.com/actions/checkout.git
synced 2026-03-10 16:34:26 +00:00
Rewrite ADR
This commit is contained in:
parent
ed69f3bbdd
commit
9be4f3c9fd
@ -1,37 +1,127 @@
|
||||
# Reference Cache für schnelle Checkouts
|
||||
# ADR 2303: Reference cache for faster checkouts
|
||||
|
||||
## Zusammenfassung
|
||||
Einführung eines lokal verwalteten Git-Referenz-Caches für Haupt-Repositories und Submodule, um Netzwerk-Traffic und Checkout-Zeiten auf persistenten Runnern (z.B. Self-Hosted) massiv zu reduzieren.
|
||||
**Date**: 2026-03-10
|
||||
|
||||
## Implementierungsplan
|
||||
**Status**: Proposed
|
||||
|
||||
1. **Inputs:**
|
||||
- In `action.yml` einen neuen Input `reference-cache` (Pfad zum Cache-Verzeichnis) hinzufügen. Default ist leer.
|
||||
- In `src/git-source-settings.ts` und `src/input-helper.ts` den Input auslesen und bereitstellen (`settings.referenceCache`).
|
||||
## Context
|
||||
|
||||
2. **Cache Manager (`src/git-cache-helper.ts`):**
|
||||
- Eine neue Klasse/Helper-Logik, die das Erstellen (`git clone --bare`) und Aktualisieren (`git fetch --force`) von Bare Cache-Repos übernimmt.
|
||||
- **Namenskonvention Cache-Verzeichnis:** Damit Admin-Lesbarkeit und Kollisionsfreiheit gewährleistet sind, wird das Cache-Verzeichnis aus der Repository-URL gebildet:
|
||||
- Alle Sonderzeichen in der URL durch `_` ersetzen.
|
||||
- Ein kurzer Hash (z. B. erste 8 Zeichen des SHA256) der echten URL zur Eindeutigkeit anhängen.
|
||||
- Beispiel: `<reference-cache>/https___github_com_actions_checkout_8f9b1c2a.git`
|
||||
Repeated checkouts of the same repositories are expensive on runners with persistent storage.
|
||||
This is especially noticeable for self-hosted runners and custom runner images that execute
|
||||
many jobs against the same repositories and submodules.
|
||||
|
||||
3. **Haupt-Repo Checkout (`src/git-source-provider.ts`):**
|
||||
- Vor dem Setup des Checkouts prüfen, ob `reference-cache` gesetzt ist.
|
||||
- Wenn ja: den Cache-Ordner für die Haupt-URL aktualisieren/anlegen.
|
||||
- Nach dem initialen `git.init()` den Pfad in `.git/objects/info/alternates` schreiben, der auf das `objects`-Verzeichnis des Cache-Ordners zeigt.
|
||||
Today, each checkout fetches objects from the remote even when the runner already has most of
|
||||
the repository history available locally from previous jobs. This increases network traffic,
|
||||
slows down checkout time, and makes recursive submodule initialization more expensive than
|
||||
necessary.
|
||||
|
||||
4. **Submodule Checkouts (Iterativ statt monolithisch):**
|
||||
- Der aktuelle Befehl `git submodule update --recursive` funktioniert nicht out-of-the-box mit `reference`, wenn jedes Submodul seinen individuellen Referenz-Cache benötigt.
|
||||
- Wenn `reference-cache` aktiv ist und Submodule initialisiert werden sollen:
|
||||
- Lese `.gitmodules` aus (alle Sub-URLs ermitteln).
|
||||
- Für jedes Submodul den Cache (genauso wie in Step 2) anlegen oder aktualisieren.
|
||||
- Submodul einzeln auschecken per `git submodule update --init --reference <cache-pfad/.git> <pfad>`.
|
||||
- Bei der Einstellung `recursive`: In jedes Submodul-Verzeichnis wechseln und den Vorgang für `.gitmodules` rekursiv auf Skript-Ebene durchführen (anstatt Git's `--recursive` Flag einfach weiterzugeben).
|
||||
Git supports reference repositories and alternates, which allow one working repository to reuse
|
||||
objects from another local repository. This mechanism is a good fit for persistent runners,
|
||||
provided the cache is managed safely and works for both the main repository and submodules.
|
||||
|
||||
## Akzeptanzkriterien
|
||||
1. **Neue Option konfigurierbar**: Der Input `reference-cache` kann übergeben werden, der Code reagiert darauf.
|
||||
2. **Ordnerstruktur korrekt**: Der Cache-Ordner für das Hauptrepo und Submodule erhält Namen nach der "URL_Sonderzeichen_Ersetzt+SHA_Cut"-Logik.
|
||||
3. **Bandbreite gespart / Alternates genutzt**: Beim Hauptcheckout wird eine `.git/objects/info/alternates`-Datei mit Pfad zum lokalen Cache erzeugt. Danach ausgeführte `git fetch`-Befehle sind signifikant schneller bzw. laden deutlich weniger Bytes herunter.
|
||||
4. **Submodule erhalten Caches**: Auch tiefe (rekursive) Submodule profitieren für deren jeweilige Remote-URL vom Cache, da pro Submodul ein passender `--reference` Punkt dynamisch berechnet und übergeben wird.
|
||||
5. **Kein --dissociate**: Aus Performance-Gründen bleibt der Arbeitsordner an den Cache gebunden (`git repack` ist zeitaufwändig). Fällt der Cache weg, muss der Workspace erst einmal neu erzeugt werden (was bei Action Runnern die Norm ist, falls es nicht ohnehin "single-use" Runner sind).
|
||||
## Decision
|
||||
|
||||
Add an optional `reference-cache` input that points to a local directory used to store managed
|
||||
bare repositories for the primary repository and its submodules.
|
||||
|
||||
### Input
|
||||
|
||||
Add a new input in `action.yml`:
|
||||
|
||||
```yaml
|
||||
reference-cache:
|
||||
description: >
|
||||
Path to a local directory used as a reference cache for Git clones.
|
||||
```
|
||||
|
||||
The value is exposed through `settings.referenceCache`.
|
||||
|
||||
### Cache layout
|
||||
|
||||
Each cached repository is stored as a bare repository inside the configured cache directory.
|
||||
|
||||
The cache directory name is derived from the repository URL by:
|
||||
|
||||
- replacing non-alphanumeric characters with `_`
|
||||
- appending a short SHA-256 hash of the original URL to avoid collisions
|
||||
|
||||
Example:
|
||||
|
||||
```text
|
||||
<reference-cache>/https___github_com_actions_checkout_8f9b1c2a.git
|
||||
```
|
||||
|
||||
### Cache lifecycle
|
||||
|
||||
Introduce helper logic in `src/git-cache-helper.ts` responsible for:
|
||||
|
||||
- creating a bare cache repository with `git clone --bare`
|
||||
- updating an existing bare cache repository with `git fetch --force`
|
||||
- serializing access with file-based locking so concurrent jobs do not corrupt the cache
|
||||
- using a temporary clone-and-rename flow to avoid leaving behind partial repositories
|
||||
|
||||
### Main repository checkout
|
||||
|
||||
When `reference-cache` is configured:
|
||||
|
||||
- prepare or update the cache for the main repository URL
|
||||
- configure the checkout repository to use the cache through Git alternates
|
||||
- keep the working repository attached to the cache instead of dissociating it
|
||||
|
||||
This allows later fetch operations to reuse local objects instead of downloading them again.
|
||||
|
||||
### Submodules
|
||||
|
||||
When submodules are enabled together with `reference-cache`, submodules are processed one by one
|
||||
instead of relying solely on a monolithic `git submodule update --recursive` flow.
|
||||
|
||||
For each submodule:
|
||||
|
||||
- read the submodule URL from `.gitmodules`
|
||||
- resolve relative URLs where possible
|
||||
- create or update a dedicated cache for that submodule repository
|
||||
- run `git submodule update --init --reference <cache> <path>` for that submodule
|
||||
|
||||
When recursive submodules are requested, repeat the same process inside each initialized submodule.
|
||||
|
||||
### Fetch depth behavior
|
||||
|
||||
When `reference-cache` is enabled, shallow fetches are usually counterproductive because object
|
||||
negotiation overhead can outweigh the benefit of a local object store.
|
||||
|
||||
For that reason:
|
||||
|
||||
- the default `fetch-depth` is overridden to `0` when `reference-cache` is enabled
|
||||
- if the user explicitly sets `fetch-depth`, keep the user-provided value and emit a warning
|
||||
|
||||
### No `--dissociate`
|
||||
|
||||
The checkout should remain connected to the reference cache.
|
||||
|
||||
Using `--dissociate` would copy objects into the working repository and typically require extra
|
||||
repacking work, which reduces the performance benefit of the cache. If the cache is removed, the
|
||||
workspace is expected to be recreated, which is acceptable for the target runner scenarios.
|
||||
|
||||
## Consequences
|
||||
|
||||
### Positive
|
||||
|
||||
- reduces network traffic for repeated checkouts on persistent runners
|
||||
- improves checkout performance for the main repository and submodules
|
||||
- reuses standard Git mechanisms instead of introducing a custom object store
|
||||
- keeps cache naming deterministic and readable for administrators
|
||||
|
||||
### Trade-offs
|
||||
|
||||
- adds cache management complexity, including locking and recovery from interrupted operations
|
||||
- submodule handling becomes more complex because each submodule may require its own cache
|
||||
- benefits are limited on ephemeral runners, where the cache is not reused across jobs
|
||||
- workspaces remain dependent on the presence of the cache until they are recreated
|
||||
|
||||
## Acceptance criteria
|
||||
|
||||
1. The `reference-cache` input can be configured and is exposed through the action settings.
|
||||
2. Cache directories for the main repository and submodules follow the sanitized-URL-plus-hash naming scheme.
|
||||
3. The main checkout uses Git alternates so later fetches can reuse local cached objects.
|
||||
4. Submodules, including recursive submodules, can use repository-specific caches.
|
||||
5. The checkout does not use `--dissociate` and remains attached to the cache for performance.
|
||||
|
||||
Loading…
Reference in New Issue
Block a user