Answer Engine (Edge)
How small models + prompt refinement compete on quality. A deep dive into the core loop that makes Clausus different.
The Challenge
Large language models (100B+ parameters) deliver impressive results, but they require massive cloud infrastructure, send your data to remote servers, and cost money per token. What if you could get 80-90% of that quality with a model that runs entirely on your laptop?
The Clausus Approach
Instead of brute-forcing answers with billions of parameters, Clausus uses prompt engineering and refinement chains to coax better outputs from smaller models (7B-13B parameters). Think of it as having a skilled intermediary who knows exactly how to ask questions to get the best answers.
The Refinement Loop
Initial Query Analysis
Clausus parses your question, identifies ambiguity, and determines what context is needed. For example: "Summarize the Q3 product roadmap" triggers a search for roadmap documents.
Sub-Prompt Generation
The query is broken into focused sub-tasks. Instead of asking the model to "do everything," we ask it to: (a) extract key themes, (b) identify milestones, (c) summarize risks. Each sub-prompt is optimized for clarity.
Local Retrieval
Clausus scans your files (with permission) for relevant chunks. Documents are indexed locally, and only matching sections are fed to the model—no full-doc uploads, no cloud processing.
Guardrail Checks
Before returning an answer, Clausus runs validation: Does the output cite real sources? Does it contradict known facts? If checks fail, the prompt is refined and retried.
Composed Output
Sub-answers are combined into a final, coherent response with citations. You see the answer, the sources, and (optionally) the reasoning chain.
Memory Update (Opt-In)
If you approve, Clausus stores useful patterns: "User often asks about Q3 roadmaps → prioritize recent planning docs." Memory is explicit, scoped, and deletable.
Example in Action
User Input:
"What are the main security concerns in our latest API spec?"
Clausus refines to:
- → Search: API spec v2.3
- → Extract: authentication methods
- → Identify: data exposure risks
- → Summarize: mitigation strategies
Final Output:
"The API spec (v2.3, page 12) uses OAuth2 but lacks rate limiting on the /users endpoint. This could enable enumeration attacks. Recommendation: Add per-IP throttling (see security-checklist.md, line 47)."
📹 30-second demo video coming soon
How Local File Retrieval Works
Granular Permissions
You explicitly grant Clausus access to folders or file types. No blanket filesystem access.
Smart Chunking
Documents are split into semantic chunks (paragraphs, sections) indexed locally. Only relevant chunks are sent to the model.
Citations Always
Every fact in the answer links back to the source file and line/page number. No mystery sources.
Memory Scoping
Memory is per-workspace or per-task. Your "work" memory doesn't leak into "personal" queries.
System Requirements
CPU / GPU
Minimum:
Intel i5 / AMD Ryzen 5
8GB RAM
Recommended:
16GB+ RAM
NVIDIA/AMD GPU
OS
Supported:
macOS 12+
Ubuntu 20.04+
Windows 10/11
Disk & Sandboxing
Storage:
10GB for models
+ index space
Isolation:
Sandboxed by default