Software Lego #1

Answer Engine (Edge)

How small models + prompt refinement compete on quality. A deep dive into the core loop that makes Clausus different.

The Challenge

Large language models (100B+ parameters) deliver impressive results, but they require massive cloud infrastructure, send your data to remote servers, and cost money per token. What if you could get 80-90% of that quality with a model that runs entirely on your laptop?

The Clausus Approach

Instead of brute-forcing answers with billions of parameters, Clausus uses prompt engineering and refinement chains to coax better outputs from smaller models (7B-13B parameters). Think of it as having a skilled intermediary who knows exactly how to ask questions to get the best answers.

The Refinement Loop

Initial Query Analysis

Clausus parses your question, identifies ambiguity, and determines what context is needed. For example: "Summarize the Q3 product roadmap" triggers a search for roadmap documents.

Sub-Prompt Generation

The query is broken into focused sub-tasks. Instead of asking the model to "do everything," we ask it to: (a) extract key themes, (b) identify milestones, (c) summarize risks. Each sub-prompt is optimized for clarity.

Local Retrieval

Clausus scans your files (with permission) for relevant chunks. Documents are indexed locally, and only matching sections are fed to the model—no full-doc uploads, no cloud processing.

Guardrail Checks

Before returning an answer, Clausus runs validation: Does the output cite real sources? Does it contradict known facts? If checks fail, the prompt is refined and retried.

Composed Output

Sub-answers are combined into a final, coherent response with citations. You see the answer, the sources, and (optionally) the reasoning chain.

Memory Update (Opt-In)

If you approve, Clausus stores useful patterns: "User often asks about Q3 roadmaps → prioritize recent planning docs." Memory is explicit, scoped, and deletable.

Example in Action

User Input:

"What are the main security concerns in our latest API spec?"

Clausus refines to:

→ Search: API spec v2.3
→ Extract: authentication methods
→ Identify: data exposure risks
→ Summarize: mitigation strategies

Final Output:

"The API spec (v2.3, page 12) uses OAuth2 but lacks rate limiting on the /users endpoint. This could enable enumeration attacks. Recommendation: Add per-IP throttling (see security-checklist.md, line 47)."

📹 30-second demo video coming soon

How Local File Retrieval Works

Granular Permissions

You explicitly grant Clausus access to folders or file types. No blanket filesystem access.

Smart Chunking

Documents are split into semantic chunks (paragraphs, sections) indexed locally. Only relevant chunks are sent to the model.

Citations Always

Every fact in the answer links back to the source file and line/page number. No mystery sources.

Memory Scoping

Memory is per-workspace or per-task. Your "work" memory doesn't leak into "personal" queries.

System Requirements

CPU / GPU

Minimum:

Intel i5 / AMD Ryzen 5

8GB RAM

Recommended:

16GB+ RAM

NVIDIA/AMD GPU

OS

Supported:

macOS 12+

Ubuntu 20.04+

Windows 10/11

Disk & Sandboxing

Storage:

10GB for models

+ index space

Isolation:

Sandboxed by default

Ready to try Answer Engine?

Request access to the pilot program and see how prompt refinement + local retrieval can transform your workflow.