diff --git a/openspec/changes/batch-pacman-checks/.openspec.yaml b/openspec/changes/batch-pacman-checks/.openspec.yaml new file mode 100644 index 0000000..76a85e8 --- /dev/null +++ b/openspec/changes/batch-pacman-checks/.openspec.yaml @@ -0,0 +1,2 @@ +schema: spec-driven +created: 2026-04-14 diff --git a/openspec/changes/batch-pacman-checks/design.md b/openspec/changes/batch-pacman-checks/design.md new file mode 100644 index 0000000..df8fb54 --- /dev/null +++ b/openspec/changes/batch-pacman-checks/design.md @@ -0,0 +1,137 @@ +## Context + +Currently, `pkg/pacman/pacman.go` uses subprocess calls to query pacman for package existence: +- `pacman -Qip ` to check local DB (per package) +- `pacman -Sip ` to check sync repos (per package) + +For n packages, this spawns 2n subprocesses (up to ~300 for typical package lists). Each subprocess has fork/exec overhead, making this the primary performance bottleneck. + +The AUR queries are already batched (single HTTP POST with all package names), which is the desired pattern. + +## Goals / Non-Goals + +**Goals:** +- Eliminate subprocess overhead for local/sync DB package lookups +- Maintain batched AUR HTTP calls (single request per batch) +- Track installed status per package in PackageInfo +- Provide dry-run output showing exact packages to install/remove +- Handle orphan cleanup correctly + +**Non-Goals:** +- Parallel AUR builds (still sequential) +- Custom pacman transaction handling (use system pacman) +- Repository configuration changes +- Package download/compile optimization + +## Decisions + +### 1. Use Jguer/dyalpm for DB access + +**Decision**: Use `github.com/Jguer/dyalpm` library instead of spawning subprocesses. + +**Rationale**: +- Direct libalpm access (same backend as pacman) +- Already Go-native with proper type safety +- Supports batch operations via `GetPkgCache()` and `PkgCache()` iterators + +**Alternatives considered**: +- Parse `pacman -Qs` output - fragile, still subprocess-based +- Write custom libalpm bindings - unnecessary effort + +### 2. Single-pass package resolution algorithm + +**Decision**: Process all packages through local DB → sync DBs → AUR in one pass. + +``` +For each package in collected state: + 1. Check local DB (batch lookup) → if found, mark Installed=true + 2. If not local, check all sync DBs (batch lookup per repo) + 3. If not in sync, append to AUR batch + +Batch query AUR with all remaining packages +Throw error if any package not found in local/sync/AUR + +Collect installed status from local DB +(Perform sync operations - skip in dry-run) +(Mark ALL currently installed packages as deps - skip in dry-run) +(Then mark collected state packages as explicit - skip in dry-run) +(Cleanup orphans - skip in dry-run) +Output summary +``` + +**Rationale**: +- Single iteration over packages +- Batch DB lookups minimize libalpm calls +- Clear error handling for missing packages +- Consistent with existing behavior + +### 3. Batch local/sync DB lookup implementation + +**Decision**: For local DB, iterate `localDB.PkgCache()` once and build a map. For sync DBs, iterate each repo's `PkgCache()`. + +**Implementation**: +```go +// Build local package map in one pass +localPkgs := make(map[string]bool) +localDB.PkgCache().ForEach(func(pkg alpm.Package) error { + localPkgs[pkg.Name()] = true + return nil +}) + +// Similarly for each sync DB +for _, syncDB := range syncDBs { + syncDB.PkgCache().ForEach(...) +} +``` + +**Rationale**: +- O(n) iteration where n = total packages in DB (not n queries) +- Single map construction, O(1) lookups per state package +- libalpm iterators are already lazy, no additional overhead + +### 4. Dry-run behavior + +**Decision**: Dry-run outputs exact packages that would be installed/removed without making any system changes. + +**Implementation**: +- Skip `pacman -Syu` call +- Skip `pacman -D --asdeps` (mark all installed as deps) +- Skip `pacman -D --asexplicit` (mark state packages as explicit) +- Skip `pacman -Rns` orphan cleanup +- Still compute what WOULD happen for output + +**Note on marking strategy**: +Instead of diffing between before/after installed packages, we simply: +1. After sync completes, run `pacman -D --asdeps` on ALL currently installed packages (this marks everything as deps) +2. Then run `pacman -D --asexplicit` on the collected state packages (this overrides them to explicit) + +This is simpler and achieves the same result. + +## Risks / Trade-offs + +1. **[Risk]** dyalpm initialization requires root privileges + - **[Mitigation]** This is same as pacman itself; if user can't run pacman, declpac won't work + +2. **[Risk]** libalpm state becomes stale if another pacman instance runs concurrently + - **[Mitigation]** Use proper locking, rely on pacman's own locking mechanism + +3. **[Risk]** AUR packages still built sequentially + - **[Acceptable]** Parallel AUR builds out of scope for this change + +4. **[Risk]** Memory usage for large package lists + - **[Mitigation]** Package map is ~100 bytes per package; 10k packages = ~1MB + +## Migration Plan + +1. Add `github.com/Jguer/dyalpm` to go.mod +2. Refactor `ValidatePackage()` to use dyalpm instead of subprocesses +3. Add `Installed bool` to `PackageInfo` struct +4. Implement new resolution algorithm in `categorizePackages()` +5. Update `Sync()` and `DryRun()` to use new algorithm +6. Test with various package combinations +7. Verify output matches previous behavior + +## Open Questions + +- **Q**: Should we also use dyalpm for `GetInstalledPackages()`? +- **A**: Yes, can use localDB.PkgCache().Collect() or iterate - aligns with overall approach \ No newline at end of file diff --git a/openspec/changes/batch-pacman-checks/proposal.md b/openspec/changes/batch-pacman-checks/proposal.md new file mode 100644 index 0000000..8d4576b --- /dev/null +++ b/openspec/changes/batch-pacman-checks/proposal.md @@ -0,0 +1,35 @@ +## Why + +The current pacman implementation spawns multiple subprocesses per package (pacman -Qip, pacman -Sip) to check if packages exist in local/sync DBs or AUR. With many packages, this creates significant overhead. Using the Jguer/dyalpm library provides direct libalpm access for batch queries, eliminating subprocess overhead while maintaining the batched AUR HTTP calls. + +## What Changes + +- **Add dyalpm dependency**: Integrate Jguer/dyalpm library for direct libalpm access +- **Batch local DB check**: Use `localDB.PkgCache()` to check all packages at once instead of per-package `pacman -Qip` +- **Batch sync DB check**: Use `syncDBs[i].PkgCache()` to check all sync repos at once instead of per-package `pacman -Sip` +- **Enhance PackageInfo**: Add `Installed bool` field to track if package is already installed +- **New algorithm**: Implement unified package resolution flow: + 1. Batch check local DB for all packages + 2. Batch check sync DBs for remaining packages + 3. Batch query AUR for non-found packages + 4. Track installed status throughout + 5. Perform sync operations with proper marking + 6. Output summary of changes + +## Capabilities + +### New Capabilities + +- `batch-package-resolution`: Unified algorithm that batch-resolves packages from local DB → sync DBs → AUR with proper installed tracking +- `dry-run-simulation`: Shows exact packages that would be installed/removed without making changes + +### Modified Capabilities + +- None - this is a pure optimization with no behavior changes visible to users + +## Impact + +- **Code**: `pkg/pacman/pacman.go` - refactored to use dyalpm +- **Dependencies**: Add Jguer/dyalpm to go.mod +- **APIs**: `ValidatePackage()` signature changes (returns installed status) +- **Performance**: O(n) subprocess calls → O(1) for local/sync DB checks \ No newline at end of file diff --git a/openspec/changes/batch-pacman-checks/specs/batch-package-resolution/spec.md b/openspec/changes/batch-pacman-checks/specs/batch-package-resolution/spec.md new file mode 100644 index 0000000..d532b46 --- /dev/null +++ b/openspec/changes/batch-pacman-checks/specs/batch-package-resolution/spec.md @@ -0,0 +1,72 @@ +## ADDED Requirements + +### Requirement: Batch package resolution from local, sync, and AUR databases +The system SHALL resolve packages in a single pass through local DB → sync DBs → AUR using batch operations to minimize subprocess/API calls. + +#### Scenario: Package exists in local DB +- **WHEN** a package from collected state exists in the local database +- **THEN** the system SHALL mark it as found, set `Installed=true`, and exclude it from AUR queries + +#### Scenario: Package exists in sync DB +- **WHEN** a package from collected state does NOT exist in local DB but exists in ANY enabled sync database +- **THEN** the system SHALL mark it as found, set `Installed=false`, and exclude it from AUR queries + +#### Scenario: Package exists only in AUR +- **WHEN** a package from collected state does NOT exist in local or sync databases but exists in AUR +- **THEN** the system SHALL mark it as found with `InAUR=true`, set `Installed=false`, and use the cached AUR info + +#### Scenario: Package not found anywhere +- **WHEN** a package from collected state is NOT in local DB, NOT in any sync DB, and NOT in AUR +- **THEN** the system SHALL return an error listing the unfound package(s) + +#### Scenario: Batch AUR query +- **WHEN** multiple packages need AUR lookup +- **THEN** the system SHALL make a SINGLE HTTP request to AUR RPC with all package names (existing behavior preserved) + +### Requirement: Efficient local DB lookup using dyalpm +The system SHALL use dyalpm's `PkgCache()` iterator to build a lookup map in O(n) time, where n is total packages in local DB, instead of O(n*m) subprocess calls. + +#### Scenario: Build local package map +- **WHEN** initializing package resolution +- **THEN** the system SHALL iterate localDB.PkgCache() once and store all package names in a map for O(1) lookups + +#### Scenario: Check package in local map +- **WHEN** checking if a package exists in local DB +- **THEN** the system SHALL perform an O(1) map lookup instead of spawning a subprocess + +### Requirement: Efficient sync DB lookup using dyalpm +The system SHALL use each sync DB's `PkgCache()` iterator to check packages across all enabled repositories. + +#### Scenario: Check package in sync DBs +- **WHEN** a package is not found in local DB +- **THEN** the system SHALL check all enabled sync databases using their iterators + +#### Scenario: Package found in multiple sync repos +- **WHEN** a package exists in more than one sync repository (e.g., core and community) +- **THEN** the system SHALL use the first match found + +### Requirement: Track installed status in PackageInfo +The system SHALL include an `Installed bool` field in `PackageInfo` to indicate whether the package is currently installed. + +#### Scenario: Package is installed +- **WHEN** a package exists in the local database +- **THEN** `PackageInfo.Installed` SHALL be `true` + +#### Scenario: Package is not installed +- **WHEN** a package exists only in sync DB or AUR (not in local DB) +- **THEN** `PackageInfo.Installed` SHALL be `false` + +### Requirement: Mark installed packages as deps, then state packages as explicit +After package sync completes, the system SHALL mark all installed packages as dependencies, then override the collected state packages to be explicit. This avoids diffing before/after states. + +#### Scenario: Mark all installed as deps +- **WHEN** package sync has completed (non-dry-run) +- **THEN** the system SHALL run `pacman -D --asdeps` to mark ALL currently installed packages as dependencies + +#### Scenario: Override state packages to explicit +- **WHEN** all installed packages have been marked as deps +- **THEN** the system SHALL run `pacman -D --asexplicit` on the collected state packages, overriding their dependency status + +#### Scenario: Dry-run skips marking +- **WHEN** operating in dry-run mode +- **THEN** the system SHALL NOT execute any `pacman -D` marking operations \ No newline at end of file diff --git a/openspec/changes/batch-pacman-checks/specs/dry-run-simulation/spec.md b/openspec/changes/batch-pacman-checks/specs/dry-run-simulation/spec.md new file mode 100644 index 0000000..40898db --- /dev/null +++ b/openspec/changes/batch-pacman-checks/specs/dry-run-simulation/spec.md @@ -0,0 +1,28 @@ +## ADDED Requirements + +### Requirement: Dry-run shows packages to install without making changes +In dry-run mode, the system SHALL compute what WOULD happen without executing any pacman operations. + +#### Scenario: Dry-run lists packages to install +- **WHEN** dry-run is enabled and packages need to be installed +- **THEN** the system SHALL populate `Result.ToInstall` with all packages that would be installed (both sync and AUR) + +#### Scenario: Dry-run lists packages to remove +- **WHEN** dry-run is enabled and orphan packages exist +- **THEN** the system SHALL NOT calculate or populate `Result.ToRemove` - orphan detection is skipped entirely in dry-run mode + +#### Scenario: Dry-run skips pacman sync +- **WHEN** dry-run is enabled +- **THEN** the system SHALL NOT execute `pacman -Syu` for package installation + +#### Scenario: Dry-run skips explicit/deps marking +- **WHEN** dry-run is enabled +- **THEN** the system SHALL NOT execute `pacman -D --asdeps` or `pacman -D --asexplicit` + +#### Scenario: Dry-run skips orphan cleanup +- **WHEN** dry-run is enabled +- **THEN** the system SHALL NOT execute `pacman -Rns` for orphan removal + +#### Scenario: Dry-run outputs count summary +- **WHEN** dry-run is enabled +- **THEN** the system SHALL still compute and output `Result.Installed` and `Result.Removed` counts as if the operations had run \ No newline at end of file diff --git a/openspec/changes/batch-pacman-checks/tasks.md b/openspec/changes/batch-pacman-checks/tasks.md new file mode 100644 index 0000000..0a27b17 --- /dev/null +++ b/openspec/changes/batch-pacman-checks/tasks.md @@ -0,0 +1,51 @@ +## 1. Setup + +- [ ] 1.1 Add `github.com/Jguer/dyalpm` to go.mod +- [ ] 1.2 Run `go mod tidy` to fetch dependencies + +## 2. Core Refactoring + +- [ ] 2.1 Update `PackageInfo` struct to add `Installed bool` field +- [ ] 2.2 Create `Pac` struct with `alpm.Handle` instead of just aurCache +- [ ] 2.3 Implement `NewPac()` that initializes alpm handle and local/sync DBs + +## 3. Package Resolution Algorithm + +- [ ] 3.1 Implement `buildLocalPkgMap()` - iterate localDB.PkgCache() to create lookup map +- [ ] 3.2 Implement `checkSyncDBs()` - iterate each sync DB's PkgCache() to find packages +- [ ] 3.3 Implement `resolvePackages()` - unified algorithm: + - Step 1: Check local DB for all packages (batch) + - Step 2: Check sync DBs for remaining packages (batch per repo) + - Step 3: Batch query AUR for remaining packages + - Step 4: Return error if any package unfound + - Step 5: Track installed status from local DB + +## 4. Sync and DryRun Integration + +- [ ] 4.1 Refactor `Sync()` function to use new resolution algorithm +- [ ] 4.2 Refactor `DryRun()` function to use new resolution algorithm +- [ ] 4.3 Preserve AUR batched HTTP calls (existing `fetchAURInfo`) +- [ ] 4.4 Preserve orphan cleanup logic (`CleanupOrphans()`) + +## 5. Marking Operations + +- [ ] 5.1 Keep `MarkExplicit()` for marking state packages +- [ ] 5.2 After sync, run `pacman -D --asdeps` on ALL installed packages (simplifies tracking) +- [ ] 5.3 After deps marking, run `pacman -D --asexplicit` on collected state packages (overrides deps) +- [ ] 5.4 Skip marking operations in dry-run mode + +## 6. Cleanup and Output + +- [ ] 6.1 Remove subprocess-based `ValidatePackage()` implementation +- [ ] 6.2 Remove subprocess-based `GetInstalledPackages()` implementation +- [ ] 6.3 Update output summary to show installed/removed counts +- [ ] 6.4 In dry-run mode, populate `ToInstall` and `ToRemove` lists + +## 7. Testing + +- [ ] 7.1 Test with packages in local DB only +- [ ] 7.2 Test with packages in sync DBs only +- [ ] 7.3 Test with AUR packages +- [ ] 7.4 Test with missing packages (should error) +- [ ] 7.5 Test dry-run mode output +- [ ] 7.6 Test orphan detection and cleanup \ No newline at end of file