isparkclaw-webui/api/updates.py at 678a7f6e492b4b6c9ec35fcd62ab5bb04df55643

Files
nesquena-hermes a4d59b9e6c fix: update banner — conflict recovery path + server self-restart after update (#816 )
* fix: update banner conflict recovery + server self-restart after update (#813 #814)

* fix(update): restart must wait for in-flight update + reset force button on retry

Two defects in the update banner flow found during review of PR #816:

1. Two-target race (webui + agent sequential)
   The client posts targets sequentially: webui succeeds and schedules
   a restart timer (2 s delay); client then posts agent; server begins
   agent fetch+pull; at T=2 s the restart timer fires os.execv mid-pull,
   killing the agent update and closing the client connection. User
   sees "Update failed (agent): Failed to fetch" even though webui did
   update, and the agent repo is in an unknown partial state.

   Fix: _schedule_restart() now blocks on _apply_lock before calling
   os.execv. If a second update is in flight when the timer fires, the
   restart thread waits until it completes. If nothing is in flight the
   lock acquire is instant, so no-op updates still restart immediately.

2. Stale force-update button across retries
   _showUpdateError sets btnForceUpdate to display:inline-block when
   res.conflict / res.diverged. Nothing resets it on the next retry,
   so a subsequent non-conflict error (e.g. network) leaves the stale
   force button visible pointing at the previous target.

   Fix: applyUpdates() now hides the force button and clears its
   data-target at the start of each attempt.

Tests:
- test_schedule_restart_waits_for_apply_lock: holds _apply_lock from a
  helper thread, verifies execv is delayed until the lock is released.
- test_schedule_restart_still_fires_when_no_update_in_flight: sanity
  check that the common path still works with no contention.
- test_apply_updates_resets_force_button_at_start: regression guard
  that the reset appears before the update loop begins.

Full suite: 1683 passed, 0 failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* fix(update): hold _apply_lock through execv + fix banner error layout

Two fixes from Opus review:

1. TOCTOU gap in _schedule_restart (api/updates.py): the original pattern
   acquired _apply_lock, released it, then called os.execv — leaving a brief
   window where a new update could start between release and execv. Fixed by
   moving os.execv inside the 'with _apply_lock:' block so the process is
   replaced while still holding the lock; no new update can acquire it.

2. Banner CSS layout (static/index.html): #updateError was a direct flex child
   of .update-banner (display:flex row), so long error messages sat inline
   between #updateMsg and the buttons instead of below the message.
   Wrapped #updateMsg + #updateError in a flex-column container so errors
   stack vertically under the status line.

* docs: add v0.50.134 CHANGELOG entry

---------

Co-authored-by: nesquena-hermes <nesquena-hermes@users.noreply.github.com>
Co-authored-by: Nathan Esquenazi <nesquena@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 17:10:41 -07:00
16 KiB

Raw Blame History

View Raw
16 KiB Raw Blame History

16 KiB

Raw Blame History