LocalMode
Privacy-first AI utilities. Run embeddings, vector search, RAG, classification, vision, and LLMs - all locally in the browser.
Built for the Modern Web
AI in the Browser
Run embeddings, classification, vision, and LLMs directly in the browser with WebGPU/WASM.
Privacy-First
Zero telemetry. No data leaves your device. All processing happens locally.
Zero Dependencies
Core package has no external dependencies. Built on native Web APIs.
Offline-Ready
Models cached in IndexedDB. Works without network after first load.
Packages
Modular architecture - use only what you need.
Core provides everything; providers add framework integrations.
@localmode/core
Vector DB, embeddings, RAG utilities, storage, security, and all core functions.
@localmode/transformers
HuggingFace Transformers.js provider for ML models in the browser.
@localmode/webllm
WebLLM provider for local LLM inference with 4-bit quantized models.
@localmode/pdfjs
PDF text extraction using PDF.js for document processing pipelines.
Simple, Powerful API
Function-first design with TypeScript. All operations return structured results.
Embeddings & Vector Search
$ pnpm install @localmode/core @localmode/transformers
import { createVectorDB, embed, embedMany, chunk } from '@localmode/core'; import { transformers } from '@localmode/transformers'; // Create embedding model const model = transformers.embedding('Xenova/all-MiniLM-L6-v2'); // Create vector database const db = await createVectorDB({ name: 'documents', dimensions: 384, }); // Chunk and embed document const chunks = chunk(documentText, { strategy: 'recursive', size: 512 }); const { embeddings } = await embedMany({ model, values: chunks.map(c => c.text) }); // Store in database await db.addMany( chunks.map((c, i) => ({ id: `chunk-${i}`, vector: embeddings[i], metadata: { text: c.text }, })) ); // Search const { embedding: queryVector } = await embed({ model, value: 'What is AI?' }); const results = await db.search(queryVector, { k: 5 });
Local LLM Inference
$ pnpm install @localmode/core @localmode/webllm
import { generateText, streamText } from '@localmode/core'; import { webllm } from '@localmode/webllm'; // Create a WebLLM model instance const model = webllm.chat('Llama-3.2-3B-Instruct-q4f16_1-MLC'); // Generate text (non-streaming) const { text } = await generateText({ model, messages: [ { role: 'system', content: 'You are a helpful assistant.' }, { role: 'user', content: 'What is machine learning?' }, ], }); console.log(text); // Stream text for real-time responses const { textStream } = await streamText({ model, messages: [ { role: 'user', content: 'Explain quantum computing in simple terms.' }, ], }); for await (const chunk of textStream) { process.stdout.write(chunk); }
Ready to Build?
Start building local-first AI applications with comprehensive documentation, examples, and guides.
Read the Documentation