Commit Graph

5 Commits

Author SHA1 Message Date
nesquena-hermes
d4a3adb7b1 fix(sessions): surface gateway SSE failures and add polling fallback (#828)
* fix(sessions): surface gateway SSE failures and add polling fallback

- add a JSON probe mode for the gateway SSE endpoint
- detect watcher-unavailable 503s from the browser
- fall back to periodic session refresh with a toast
- add probe payload tests and endpoint coverage

Fixes #635

* fix(sessions): surface gateway SSE failures and add polling fallback (#826)

Absorbed from PR #826 by @cloudyun888 (fixes #635).

When the gateway watcher thread is not running, the browser now shows a
toast notification and falls back to 30-second periodic polling for session
sync. Previously the SSE failure was completely silent with no user feedback.

Changes from original PR:
- Deleted misplaced test_gateway_sse_probe_unit.py (was at repo root, not
  discovered by `pytest tests/`); unit tests moved into tests/test_gateway_sync.py
- _gateway_sse_probe_payload now checks watcher._thread.is_alive() rather
  than just watcher is not None — a watcher instance with a dead poll thread
  now correctly reports unavailable and activates the polling fallback
- probeGatewaySSEStatus catch(e) now starts the polling fallback on network
  error rather than silently swallowing the failure
- Added 5 unit tests covering all watcher-alive/dead/missing/disabled branches

Co-authored-by: cloudyun888 <269269188+86cloudyun-afk@users.noreply.github.com>

* cleanup(gateway): public is_alive() + dedup probe/live watcher-alive check + changelog

Three small cleanups on top of @cloudyun888's PR #826 absorption:

1. Add GatewayWatcher.is_alive() public accessor so routes.py doesn't
   reach into the private _thread attribute.  The existing private-
   attribute check stays as a defensive fallback for any older in-
   memory instance or test double that doesn't implement the full API.

2. Dedupe the watcher_alive computation in _handle_gateway_sse_stream:
   the live-SSE path now calls _gateway_sse_probe_payload(...) and reads
   its watcher_running field instead of re-deriving the same logic
   inline.  Keeps probe and SSE in sync automatically.

3. CHANGELOG trailer was (#826, fixes #635, @cloudyun888) — this PR is
   #828, so updated to (#828, absorbs PR #826 by @cloudyun888, fixes
   #635) matching the repo convention for absorbed PRs (see #805).

Added two regression tests:
- test_gateway_watcher_is_alive_public_method — covers the three
  lifecycle states (before start, while running, after stop).
- test_probe_payload_prefers_public_is_alive — asserts the probe
  uses watcher.is_alive() rather than poking _thread when the
  public method exists.

Full suite: 1735 passed, 0 new failures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: cloudyun888 <269269188+86cloudyun-afk@users.noreply.github.com>
Co-authored-by: nesquena-hermes <nesquena-hermes@users.noreply.github.com>
Co-authored-by: Nathan Esquenazi <nesquena@gmail.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 21:18:55 -07:00
Nathan Esquenazi
f1590fdb07 fix(sessions): return None instead of 'unknown' for missing gateway session model (fixes #443) 2026-04-14 19:06:22 +00:00
Nathan Esquenazi
a4136f2da5 fix(gateway): filter orphan sessions from SSE watcher (HAVING COUNT > 0) 2026-04-14 17:54:30 +00:00
nesquena-hermes
dd17a0e9b7 security: bandit fixes B310/B324/B110 + QuietHTTPServer (#354)
* security: fix bandit security issues (B310, B324)

- Add usedforsecurity=False to MD5 hash in gateway_watcher.py
- Add URL scheme validation to prevent file:// access in config.py
- Add URL validation to bootstrap.py health check
- Add nosec comments where runtime validation exists

* fix: handle ConnectionResetError gracefully and add debug logging

- Add QuietHTTPServer class to suppress noisy connection reset errors
  caused by clients disconnecting abruptly (fixes log spam from
  'ConnectionResetError: [Errno 54] Connection reset by peer')

- Replace silent 'pass' statements with logger.debug() calls across
  api/auth.py, api/config.py, api/gateway_watcher.py, api/models.py,
  and api/onboarding.py for better observability during troubleshooting

- All tests pass (25 passed in test_regressions.py)

* chore: add debug logging to profiles and routes modules

- Replace silent 'pass' statements with logger.debug() calls in
  api/profiles.py for better error visibility during profile switching
  and module patching

- Add logger initialization to api/routes.py

* security: fix B110 bare except/pass issues (bandit security scan)

- Replace bare except/pass patterns with logger.debug() calls
- Fixes CWE-703 (improper check/handling of exceptional conditions)
- Files affected: routes.py, state_sync.py, streaming.py, workspace.py, server.py
- All tests pass successfully

* security: bandit fixes B310/B324/B110 + QuietHTTPServer (#354)

- api/gateway_watcher.py: MD5 usedforsecurity=False (B324)
- api/config.py, bootstrap.py: URL scheme validation before urlopen (B310)
- 12 files: replace bare except/pass with logger.debug() (B110)
- server.py: QuietHTTPServer suppresses client disconnect log noise
- server.py: fix sys.exc_info() (was traceback.sys.exc_info(), impl detail)
- tests/test_sprint43.py: 19 new tests covering all security fixes
- CHANGELOG.md: v0.50.14 entry; 841 tests total (up from 822)

---------

Co-authored-by: lawrencel1ng <lawrence.ling@global.ntt>
Co-authored-by: Nathan Esquenazi <nesquena@gmail.com>
2026-04-13 11:11:56 -07:00
nesquena-hermes
711bb5a6c9 feat: real-time gateway session sync (Phase 1) (#274)
* feat: add real-time gateway session sync (Phase 1)

- Add gateway_watcher.py: background daemon polling state.db every 5s
  for gateway session changes (telegram, discord, slack, etc.)
- Extend get_cli_sessions() to include all non-webui sources
- Add SSE endpoint /api/sessions/gateway/stream for real-time push
- Add dynamic source badges (telegram=blue, discord=purple, slack=dark purple)
- Rename 'Show CLI sessions' to 'Show agent sessions'
- Wire watcher lifecycle into server start/stop
- 10 tests covering metadata, filtering, SSE, and watcher lifecycle
- Activated via the same checkbox as CLI session import

Addresses GitHub issue #272

* fix: SSE event name mismatch, TLS attribute, remove PLAN.md

- Fix critical SSE bug: frontend listened for 'gateway_session_update'
  but backend sends 'sessions_changed' -- events were silently dropped
- Fix frontend field check: data.changed -> data.sessions (matches
  the actual payload structure from gateway_watcher)
- Fix TLS: ssl.TLSv1_2 -> ssl.TLSVersion.TLSv1_2 (the bare attribute
  does not exist, would crash TLS setup and silently fall back to HTTP)
- Remove PLAN.md: implementation plan should not be committed to repo

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: test isolation and slow-consumer sentinel in gateway sync

tests/test_gateway_sync.py:
- Fix _get_test_state_dir() path mismatch: the function was computing
  HERMES_HOME/webui-mvp-test but conftest.py sets HERMES_HOME=TEST_STATE_DIR,
  so state.db was written to a double-nested path the server never read.
  Now uses HERMES_WEBUI_STATE_DIR first (which conftest sets directly to
  TEST_STATE_DIR), fixing the 7/10 test failures in full-suite ordering.
- Fix conn cleanup: removed conn.close() from inside try blocks so the
  connection stays valid for _remove_test_sessions() in the finally block.
  Previously the closed conn caused ProgrammingError in finally (swallowed
  by bare except), leaving ghost sessions in state.db on test failure.

api/gateway_watcher.py:
- Fix slow-consumer queue eviction: when a subscriber queue fills (>10 events)
  and is removed from _subscribers, now puts a None sentinel into it so the
  SSE handler unblocks and closes the connection, letting EventSource
  auto-reconnect. Without this the connection stayed open but received no
  further events.

* fix: test isolation — set HERMES_WEBUI_TEST_STATE_DIR in conftest

The gateway sync tests write directly to state.db and must use the same
path the test server reads from.  Previously they computed the path
independently, which broke when test_auth_sessions.py set a different
HERMES_WEBUI_STATE_DIR in the test-process environment at import time.

tests/conftest.py:
- Set HERMES_WEBUI_TEST_STATE_DIR=TEST_STATE_DIR in the test process's
  os.environ (via setdefault) so gateway tests can read it reliably.
  Using setdefault preserves any explicit override the caller may pass.

tests/test_gateway_sync.py:
- Simplify _get_test_state_dir(): check HERMES_WEBUI_TEST_STATE_DIR first
  (now reliably set by conftest), fall back to HERMES_HOME/webui-mvp-test.
  Remove the workaround that tried to snapshot HERMES_HOME at import time.

Result: 658/658 tests pass in full-suite ordering (was 651 pass / 7 fail).

---------

Co-authored-by: bergeouss <bergeouss@users.noreply.github.com>
Co-authored-by: Nathan Esquenazi <nesquena@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 20:53:12 -07:00