55m 41s logged

making Candle actually WASM-compatible

so remember the “no #[cfg] in cross-cutting interfaces” rule? the CandleEmbedder was breaking it. not with #[cfg] exactly, but with hf-hub, which uses filesystem APIs (dirs, mmap) that don’t exist in WASM. the model loading was desktop-only and I was pretending that was fine.

it was not fine. I fixed it.

hf-hub is gone

replaced with reqwest. instead of hf-hub’s sync filesystem API that downloads to a local cache directory, the embedder now does raw HTTP GETs to huggingface.co/{model}/resolve/main/{file}. three downloads: model.safetensors, config.json, tokenizer.json. the bytes stay in memory, no filesystem touch.

reqwest is platform-conditional in Cargo.toml: rustls on native (no OpenSSL dependency), bare defaults on wasm32 (uses browser fetch under the hood). plus getrandom with the wasm_js feature so random number generation works in WASM.

two feature flags instead of one

candle now just gives you the model and tokenizer types. no download capability, no reqwest. you construct a CandleEmbedder from bytes you already have.

candle-load adds reqwest and the CandleEmbedder::load() method that downloads from HuggingFace. this is the one that pulls in the network stack.

the split matters because the WASM build might want to load model bytes from OPFS or a bundled asset instead of downloading every time. the download path is opt-in.

mmap is gone too

from_mmaped_safetensors (with its unsafe block) became from_buffered_safetensors. loads from a byte vec instead of memory-mapping a file path. works everywhere. no unsafe. the vocab_size is now read from config.json instead of hardcoded to 30522.

Tokenizer::from_file became Tokenizer::from_bytes. same pattern.

error handling cleanup

every candle operation was using ? directly, which only works if PenumbraError implements From<candle_core::Error>. it doesn’t, and it shouldn’t, because candle errors are an implementation detail. added an e_msg helper that wraps any Display into PenumbraError::Embedding, and switched every candle call to .map_err(e_msg)?. verbose but correct.

ArcticEmbedXS::new and forward now return candle’s own error type instead of PenumbraError. the boundary between “candle stuff” and “penumbra stuff” is cleaner. the CandleEmbedder wrapper handles the translation.

tests

New candle tests behind #[cfg(feature = "candle")]. the trick: building a synthetic safetensors file and a minimal WordLevel tokenizer entirely in memory. no model download needed for the test suite.

test_safetensors() generates fake embedding weights and encoder weights with deterministic values, writes the safetensors header manually (much better than pulling it in) (length prefix + JSON metadata + raw f32 bytes), and hands it to VarBuilder. test_tokenizer_bytes() builds a 10-word vocabulary with a Whitespace pre-tokenizer.

tests cover: forward pass output shape, L2 normalization, non-zero output, embedder dimensions, embed_text roundtrip. plus one candle-load gated test that actually downloads the real model from HuggingFace (only runs when you explicitly pass --features candle-load).

the “no #[cfg] in cross-cutting interfaces” rule now holds for real. the entire embed pipeline compiles on wasm32 without conditional compilation. reqwest handles the platform difference internally. candle handles the compute. the embedder trait doesn’t know or care.

Quick note, sorry about there mostly being terminal or VSCode images lol, there’s no UI to demo rn…