fix(smd): strip javascript:/data:/vbscript: URLs — smd does not sanitize schemes

streaming-markdown@0.2.15 preserves arbitrary URL schemes in href/src.
Verified with a Node + jsdom harness:

  IN : [click](javascript:alert(1))
  OUT: <p><a href="javascript:alert(1">click</a>)</p>        ← XSS vector

Confirmed unsafe for: javascript:, vbscript:, data:text/html, file://.
The library uses only safe DOM primitives (createElement/appendChild/
createTextNode — no innerHTML/eval), so <script> tags are escaped as
text, but URL-scheme filtering is absent. The existing renderMd() path
implicitly filtered to http(s) via its regex, so this is a regression
the moment streaming markdown is enabled.

Attack path: agent echoes prompt-injection content containing a
markdown link with javascript: href → smd renders it live → user clicks
during the streaming window → JS executes in webui origin → session
cookie, API calls, etc.

Fix: walk the live DOM after each parser_write (and again after
parser_end) and remove href/src attributes whose scheme isn't on the
safe allowlist (http, https, mailto, tel, and relative/anchor paths).
Blocked anchors keep their text content but lose href; blocked images
lose src and get data-blocked-scheme="1" for debugging.

Harness confirms all 10 tested cases behave correctly — javascript:,
vbscript:, data:text/html, file:// all stripped; https://, /path,
#anchor, mailto:, tel: all preserved.

Added 5 regression tests in TestSmdUrlSchemeSanitization that lock:
  - the sanitize helper exists
  - the allowlist regex permits https? and forbids javascript/vbscript/data:
  - _smdWrite invokes sanitize after parser_write
  - _smdEndParser invokes sanitize after parser_end
  - the sanitizer covers both <a href> and <img src>

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Nathan Esquenazi
2026-04-23 16:28:40 -07:00
parent 89b0c8eb41
commit b563484a56
2 changed files with 102 additions and 0 deletions

View File

@@ -384,6 +384,9 @@ function attachLiveStream(activeSid, streamId, uploaded=[], options={}){
function _smdEndParser(){
if(_smdParser&&window.smd){
try{window.smd.parser_end(_smdParser);}catch(_){}
// parser_end may flush remaining markdown that creates new links/images —
// re-sanitize the body before the DOM is handed off to highlightCode / renderMessages.
if(assistantBody){_sanitizeSmdLinks(assistantBody);}
}
_smdParser=null;
_smdWrittenLen=0;
@@ -396,6 +399,31 @@ function attachLiveStream(activeSid, streamId, uploaded=[], options={}){
if(!delta) return;
try{window.smd.parser_write(_smdParser,delta);}catch(_){}
_smdWrittenLen=displayText.length;
// streaming-markdown does NOT sanitize URL schemes — `[click](javascript:...)`
// and `![alt](javascript:...)` survive as href/src. Strip any unsafe schemes
// from anchors/images that were just added to the live DOM. The existing
// renderMd() path filters these via its http(s)-only regex; we need a matching
// guard here so the live-stream path isn't an XSS vector for agent-echoed
// prompt-injection content. The final renderMessages() call at `done` uses
// renderMd which is already safe, but during streaming the user could click
// a malicious link before that replacement happens.
if(assistantBody){_sanitizeSmdLinks(assistantBody);}
}
// Allowed URL schemes for anchors and images rendered from agent-streamed markdown.
// Matches the effective allowlist of renderMd() (http/https via regex + relative).
const _SMD_SAFE_URL_RE=/^(?:https?:|mailto:|tel:|\/|#|\?|\.)/i;
function _sanitizeSmdLinks(root){
if(!root||!root.querySelectorAll) return;
const _a=root.querySelectorAll('a[href]');
for(let i=0;i<_a.length;i++){
const n=_a[i],v=n.getAttribute('href')||'';
if(!_SMD_SAFE_URL_RE.test(v)){n.removeAttribute('href');n.setAttribute('data-blocked-scheme','1');}
}
const _im=root.querySelectorAll('img[src]');
for(let i=0;i<_im.length;i++){
const n=_im[i],v=n.getAttribute('src')||'';
if(!_SMD_SAFE_URL_RE.test(v)){n.removeAttribute('src');n.setAttribute('data-blocked-scheme','1');}
}
}
function _scheduleRender(){
if(_renderPending) return;