AI systems aren’t web apps – A LegalTech case study

Working on a legaltech AI project has been eye-opening. Even after building web applications and working on other AI-related projects, this one has reminded me just how different AI systems are from “traditional” web software.

In traditional web apps, things are predictable. Request–response calls, short latencies, and standard scaling patterns. You can usually estimate load, cache results, and scale horizontally without too many surprises. AI changes all of that.

Event-driven infrastructure

AI systems often require event-driven infrastructure. Generating responses can take longer, sometimes significantly longer, depending on the model and the complexity of the task. In some applications, streaming partial responses can be used to reduce perceived latency, but that adds complexity.

Scaling in this case isn’t as simple as adding more or bigger nodes to a Kubernetes cluster. You need to think about batching, GPU availability, memory limits, and the cost of inference. All factors that don’t exist in traditional web world.

Beyond API wrappers

If your goal is more than just calling an AI API, you’re entering a whole new set of technical demands:

Vectorization and embeddings: Representing text or documents in a way AI can understand and search efficiently.
RAG/CAG pipelines: Retrieving and generating context-aware responses.
LoRA and fine-tuning: Adapting models to your domain or workflow.

Each piece has its own quirks and requires careful integration. It’s not just about hooking up an API; it’s about building a system that can reliably handle AI workloads in production.

Our solution: a fully self-hosted event-driven stack

To address these challenges, we designed a fully self-hosted, event-driven infrastructure tailored for legaltech.

Classic web stack: NestJS + PostgreSQL for the core API and transactional workloads.
Event handling: NATS JetStream to ensure exactly-once and at-least-once processing of events, which is critical.
AI workloads: Python APIs handling vectorization, LLM inference, and other AI logic.
Vector database: QdrantDB for efficient, scalable vector storage and retrieval.

Everything is self-hosted. No cloud dependencies: This is how we guarantee data privacy, which is absolutely a must for legaltech applications.

Good News

So the good news is: building fully functional, privacy-safe AI setups is absolutely feasible. It requires a different approach than traditional web apps, but the financial pitfalls of running everything in the cloud, as well as speed, reliability, and compliance issues, can be completely avoided with self-hosted (sometimes on-premise 😉), carefully designed infrastructure.