* fix: update banner conflict recovery + server self-restart after update (#813 #814) * fix(update): restart must wait for in-flight update + reset force button on retry Two defects in the update banner flow found during review of PR #816: 1. Two-target race (webui + agent sequential) The client posts targets sequentially: webui succeeds and schedules a restart timer (2 s delay); client then posts agent; server begins agent fetch+pull; at T=2 s the restart timer fires os.execv mid-pull, killing the agent update and closing the client connection. User sees "Update failed (agent): Failed to fetch" even though webui did update, and the agent repo is in an unknown partial state. Fix: _schedule_restart() now blocks on _apply_lock before calling os.execv. If a second update is in flight when the timer fires, the restart thread waits until it completes. If nothing is in flight the lock acquire is instant, so no-op updates still restart immediately. 2. Stale force-update button across retries _showUpdateError sets btnForceUpdate to display:inline-block when res.conflict / res.diverged. Nothing resets it on the next retry, so a subsequent non-conflict error (e.g. network) leaves the stale force button visible pointing at the previous target. Fix: applyUpdates() now hides the force button and clears its data-target at the start of each attempt. Tests: - test_schedule_restart_waits_for_apply_lock: holds _apply_lock from a helper thread, verifies execv is delayed until the lock is released. - test_schedule_restart_still_fires_when_no_update_in_flight: sanity check that the common path still works with no contention. - test_apply_updates_resets_force_button_at_start: regression guard that the reset appears before the update loop begins. Full suite: 1683 passed, 0 failures. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(update): hold _apply_lock through execv + fix banner error layout Two fixes from Opus review: 1. TOCTOU gap in _schedule_restart (api/updates.py): the original pattern acquired _apply_lock, released it, then called os.execv — leaving a brief window where a new update could start between release and execv. Fixed by moving os.execv inside the 'with _apply_lock:' block so the process is replaced while still holding the lock; no new update can acquire it. 2. Banner CSS layout (static/index.html): #updateError was a direct flex child of .update-banner (display:flex row), so long error messages sat inline between #updateMsg and the buttons instead of below the message. Wrapped #updateMsg + #updateError in a flex-column container so errors stack vertically under the status line. * docs: add v0.50.134 CHANGELOG entry --------- Co-authored-by: nesquena-hermes <nesquena-hermes@users.noreply.github.com> Co-authored-by: Nathan Esquenazi <nesquena@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
16 KiB
16 KiB