We’re building an in-memory autocomplete system. It will just be single user, no persistence, no concurrency. The features that we require are:
- Word insertion (frequency counted)
- Word deletion (frequency decrement + structural pruning)
- Top-K autocomplete by prefix, ranked by highest frequency
- Single user, no persistence, no concurrency
Here are some initial ideas
Flat List of Words
words = ["apple", "app", "application"]
Query cost grows with total dataset, filtering by prefix and finding all the completions will be very expensive
Prefix Map
{
"a": [...],
"ap": [...],
"app": [...]
}
Quadratic prefix duplication, hard to maintain consistent ranking. The cost to update is also very high
Instead the solution we are going for is to model the data as a Trie so searching and storage is optimised. The search will be very fast and getting the auto completions will be just exploring from that point like a dfs. Storage will be hyper optimised as same prefixes share the same path in tree
State Model
class Node:
children: Dict[str, Node]
is_word: bool
frequency: int
class Trie:
root: Node
Node
Represents a prefix. It stores branching structure (children), terminal marker (is_word) and frequency count
Core Operations
insert(word)
- Traverse characters.
- Create missing nodes.
- Mark final node as word.
- Increment frequency.
delete(word)
Deletion has two phases.
Phase 1 — Logical Removal
- Traverse and store path in stack.
- Decrement frequency.
- If frequency > 0 → stop.
- Else mark
is_word = False.
Phase 2 — Structural Pruning
Walk upward using the stack. At each node, if the node has children OR node.is_word then stop, else remove node from parent
autocomplete(prefix)
- Navigate to prefix node. Traverse characters. If missing → return
[]. - Traverse subtree. BFS/DFS from prefix node. For each terminal node, treat as candidate word.
- Maintain top-k (min-heap). Keep heap of size ≤ k storing (frequency, word).
Implementation (Python)
from collections import deque
import heapq
class Node:
def __init__(self):
self.children: dict[str, "Node"] = {}
self.is_word = False
self.frequency = 0
def add_child(self, name: str) -> "Node":
if name not in self.children:
self.children[name] = Node()
return self.children[name]
class Trie:
def __init__(self) -> None:
self.root = Node()
def insert(self, word: str):
current = self.root
for ch in word:
current = current.add_child(ch)
current.is_word = True
current.frequency += 1
def delete(self, word: str):
stack: list[tuple[Node, str]] = []
current = self.root
for ch in word:
if ch not in current.children:
return
stack.append((current, ch))
current = current.children[ch]
if not current.is_word:
return
current.frequency -= 1
if current.frequency > 0:
return
current.is_word = False
while stack:
parent, ch = stack.pop()
node = parent.children[ch]
if node.children or node.is_word:
break
del parent.children[ch]
def auto_complete(self, prefix: str, k: int = 10):
current = self.root
for ch in prefix:
if ch not in current.children:
return []
current = current.children[ch]
heap = []
queue = deque([(current, prefix)])
while queue:
node, acc = queue.popleft()
if node.is_word:
if len(heap) < k:
heapq.heappush(heap, (node.frequency, acc))
elif node.frequency > heap[0][0]:
heapq.heappop(heap)
heapq.heappush(heap, (node.frequency, acc))
for key, val in node.children.items():
queue.append((val, acc + key))
result = []
while heap:
result.append(heapq.heappop(heap)[1])
return result[::-1]
Tradeoffs
This design:
- Minimizes memory overhead
- Optimizes insert/delete
- Computes ranking lazily
But:
- Autocomplete cost grows with subtree size
- No caching
- Not optimal for heavy read workloads