fix: update banner — conflict recovery path + server self-restart after update (#816)

* fix: update banner conflict recovery + server self-restart after update (#813 #814)

* fix(update): restart must wait for in-flight update + reset force button on retry

Two defects in the update banner flow found during review of PR #816:

1. Two-target race (webui + agent sequential)
   The client posts targets sequentially: webui succeeds and schedules
   a restart timer (2 s delay); client then posts agent; server begins
   agent fetch+pull; at T=2 s the restart timer fires os.execv mid-pull,
   killing the agent update and closing the client connection. User
   sees "Update failed (agent): Failed to fetch" even though webui did
   update, and the agent repo is in an unknown partial state.

   Fix: _schedule_restart() now blocks on _apply_lock before calling
   os.execv. If a second update is in flight when the timer fires, the
   restart thread waits until it completes. If nothing is in flight the
   lock acquire is instant, so no-op updates still restart immediately.

2. Stale force-update button across retries
   _showUpdateError sets btnForceUpdate to display:inline-block when
   res.conflict / res.diverged. Nothing resets it on the next retry,
   so a subsequent non-conflict error (e.g. network) leaves the stale
   force button visible pointing at the previous target.

   Fix: applyUpdates() now hides the force button and clears its
   data-target at the start of each attempt.

Tests:
- test_schedule_restart_waits_for_apply_lock: holds _apply_lock from a
  helper thread, verifies execv is delayed until the lock is released.
- test_schedule_restart_still_fires_when_no_update_in_flight: sanity
  check that the common path still works with no contention.
- test_apply_updates_resets_force_button_at_start: regression guard
  that the reset appears before the update loop begins.

Full suite: 1683 passed, 0 failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(update): hold _apply_lock through execv + fix banner error layout

Two fixes from Opus review:

1. TOCTOU gap in _schedule_restart (api/updates.py): the original pattern
   acquired _apply_lock, released it, then called os.execv — leaving a brief
   window where a new update could start between release and execv. Fixed by
   moving os.execv inside the 'with _apply_lock:' block so the process is
   replaced while still holding the lock; no new update can acquire it.

2. Banner CSS layout (static/index.html): #updateError was a direct flex child
   of .update-banner (display:flex row), so long error messages sat inline
   between #updateMsg and the buttons instead of below the message.
   Wrapped #updateMsg + #updateError in a flex-column container so errors
   stack vertically under the status line.

* docs: add v0.50.134 CHANGELOG entry

---------

Co-authored-by: nesquena-hermes <nesquena-hermes@users.noreply.github.com>
Co-authored-by: Nathan Esquenazi <nesquena@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
nesquena-hermes
2026-04-21 17:10:41 -07:00
committed by GitHub
parent 811424a87b
commit a4d59b9e6c
6 changed files with 639 additions and 8 deletions

View File

@@ -1,5 +1,10 @@
# Hermes Web UI -- Changelog
## [v0.50.134] — 2026-04-21
### Fixed
- **Update banner: conflict/diverged recovery path + server self-restart after update** — three failure modes resolved. (1) `Update failed (agent): Repository has unresolved merge conflicts` was a dead-end with no recovery path; the error now includes an actionable `git checkout . && git pull --ff-only` command, a persistent inline display (not a fleeting toast), and a **Force update** button that executes the reset via the new `POST /api/updates/force` endpoint. (2) After a successful update, the server now self-restarts via `os.execv` (2 s delay), eliminating the stale-`sys.modules` bug that broke custom provider chat on the next request. (3) When both webui and agent updates are pending, the restart now correctly waits for the second update to complete before re-executing (`_apply_lock` coordination), preventing the mid-pull kill race. Closes #813, #814. (#816)
## [v0.50.133] — 2026-04-21
### Added