Nader Elnagar

Building a Seamless Blog Engine: Obsidian-First Markdown, MDX, and Build-Time Highlighting

Nader Elnagar
nextjs
mdx
obsidian
shiki
react-19

Building a Seamless Blog Engine

I write almost everything in Obsidian. Notes, drafts, half-thoughts, snippets — they all live in a single vault, with backlinks, embedded images, and callouts. The problem is that Obsidian's flavor of Markdown isn't standard Markdown: ![image](/images/image.png) for images, > [!NOTE] for callouts, wiki-links everywhere. Pushing one of those files through a normal MDX pipeline produces broken images and ugly blockquotes.

So instead of changing how I write, I built a thin engine that lets me drop my Obsidian notes straight into /content/blog and have them render correctly — admonitions, embedded images, code blocks, the whole thing.

This post is both a tour of how that engine works and a worked example of every feature it supports, because every code block, callout, and image you're about to see is generated by the same pipeline you're reading about.


The problem

The brief was deceptively simple:

  1. Authoring should feel like Obsidian. No frontmatter dance, no special fences, no "now port your callouts to MDX components" step.
  2. Output should feel like a hand-crafted Next.js site. Optimized images, dual-theme syntax highlighting, instant page loads, real SEO, no client-side JavaScript for static content.
  3. Adding a post should be a single git commit. No CMS, no database, no build step beyond next build.

The catch is that goals 1 and 2 fight each other. Obsidian markdown isn't a strict superset of CommonMark — ![file](/images/file.png) will trip up any standard parser, and > [!WARNING] parses as a perfectly valid (and perfectly ugly) blockquote. Bridging the two is the entire job.


The architecture in one diagram

Every post moves through a four-stage pipeline. Each stage has exactly one job, which keeps the system debuggable.

┌────────────────────┐   ┌─────────────────────┐   ┌──────────────────────┐   ┌─────────────────┐
│ /content/blog/*.md │──▶│ String preprocessor │──▶│ MDX + plugin pipeline│──▶│ React renderers │
│  (Obsidian native) │   │  (lib/mdx.ts)       │   │  (remark + rehype)   │   │ (mdx-components)│
└────────────────────┘   └─────────────────────┘   └──────────────────────┘   └─────────────────┘
        author-time           build-time, regex          build-time, AST            render-time

The split between string-level and tree-level transformations is the most important design decision in the codebase, so let's walk through why.


Stage 1 — String preprocessing

Some Obsidian quirks aren't valid Markdown at all. ![diagram](/images/diagram.png) doesn't parse — there's no AST node that represents it — so by the time MDX hands you a tree, it's too late: the syntax has already been mangled into plain text.

The fix is to rewrite Obsidian syntax into standard Markdown before the parser sees it, with a single regex pass:

export function preprocessObsidianMarkdown(raw: string): string {
  return raw.replace(
    /!\[\[([^\]|]+?)(?:\|([^\]]+))?\]\]/g,
    (_match, filename: string, alias?: string) => {
      const trimmedFile = filename.trim();
      const altFromName =
        trimmedFile.split("/").pop()?.replace(/\.[^.]+$/, "") ?? trimmedFile;
      const alt = (alias?.trim() || altFromName).replace(/[[\]]/g, "");
      const src = trimmedFile.startsWith("/") ? trimmedFile : `/${trimmedFile}`;
      return `![${alt}](${src})`;
    }
  );
}

That single regex covers three Obsidian shapes and rewrites them all into vanilla Markdown:

InputRewritten to
![diagram](/images/diagram.png)![diagram](/diagram.png)
![architecture](/images/notes/architecture.png)![architecture](/notes/architecture.png)
![System diagram](/images/diagram.png%5C)![System diagram](/diagram.png)

After this pass, the rest of the pipeline never has to know Obsidian exists. Every downstream tool — remark, rehype, the next/image wrapper — sees a normal image node and behaves correctly.

This is also where I'd add ==highlight== marks, [[wiki-links]] for cross-post navigation, or Obsidian's ^block-id references — anything that can be massaged into valid Markdown belongs in stage one.


Stage 2 — MDX with a plugin pipeline

Once the string is standards-compliant Markdown, it goes into next-mdx-remote/rsc. The interesting part is the plugin list:

<MDXRemote
  source={post.content}
  options={{
    mdxOptions: {
      remarkPlugins: [remarkGfm],
      rehypePlugins: [
        [rehypePrettyCode, rehypePrettyCodeOptions],
        rehypeCodeRaw,
      ],
    },
    parseFrontmatter: false,
  }}
  components={mdxComponents}
/>

Three plugins, each doing one thing:

  • remarkGfm — adds GitHub-flavored Markdown features: tables, strikethrough, task lists, autolinks. Standard stuff, but the table you saw a moment ago wouldn't render without it.
  • rehypePrettyCode — runs every fenced code block through Shiki at build time, producing fully tokenized HTML with two themes baked in (github-light for light mode, github-dark-dimmed for dark). Zero client-side JS for highlighting.
  • rehypeCodeRaw — a tiny custom plugin (more on this below) that walks the highlighted output and stamps the original source onto each <figure> so the copy button has something to copy.

parseFrontmatter: false is important: we already stripped frontmatter with gray-matter upstream, and letting MDX try again would either fail or double-handle it.


Stage 3 — Build-time syntax highlighting

The highlighting story deserves its own section because it's where the engine punches above its weight.

Dual themes, zero client JS

Shiki tokenizes at build time and emits inline styles using two CSS variables per token: --shiki-light and --shiki-dark (plus *-bg for backgrounds). A small block of CSS in globals.css picks the right one based on the active color scheme:

[data-rehype-pretty-code-figure] pre {
  background-color: var(--shiki-light-bg);
  color: var(--shiki-light);
}
 
.dark [data-rehype-pretty-code-figure] pre {
  background-color: var(--shiki-dark-bg);
  color: var(--shiki-dark);
}

Toggle the theme switcher in the top-right and watch every code block on this page re-color instantly — no flash, no re-tokenization, no JavaScript. Just CSS variables.

Line numbers, line highlights, word highlights

You've already seen all three demonstrated in this post. The fence syntax matches Shiki's conventions:

You writeYou get
```tsPlain syntax highlighting
```ts showLineNumbersAdds gutter line numbers
```ts {1,3-5}Highlights lines 1, 3, 4, 5
```ts /useState/Highlights every useState token
`value{:ts}`Inline-highlights with TS coloring

Mixing them is fine — the TSX block earlier in this post uses showLineNumbers {3,7-12} together. The combinatorial fence syntax is the kind of thing that would be a nightmare to implement from scratch, which is why we lean on rehype-pretty-code for it.

The copy button — and the plugin we had to write

We wanted a "Copy" button on every code block. The trouble is that Shiki shreds the original source into a tree of tokenized <span>s. Once that's done, there's no data-source attribute or hidden text node holding the original — the source has been atomized. Trying to reconstruct it from the DOM at click time is fragile (whitespace, ligatures, and highlighted spans all get in the way).

Older versions of rehype-pretty-code exposed a __rawString__ property for exactly this, but it was removed. So we wrote a 50-line companion rehype plugin that runs after the highlighter and walks every highlighted figure, joining the leaf-text back together:

import { visit } from "unist-util-visit";
import type { Element, Root } from "hast";
import type { Plugin } from "unified";
 
export const rehypeCodeRaw: Plugin<[], Root> = () => (tree) => {
  visit(tree, "element", (node: Element) => {
    if (node.tagName !== "figure") return;
    if (!node.properties?.["dataRehypePrettyCodeFigure"]) return;
 
    let raw = "";
    visit(node, "text", (textNode) => {
      raw += textNode.value;
    });
 
    node.properties["dataRaw"] = raw;
  });
};

That data-raw attribute makes it to the rendered HTML, where a tiny client component (<CopyButton>) reads it and writes to the clipboard. The button is the only client-side JavaScript in the entire blog reading experience.


Stage 4 — React component overrides

After all that processing, the renderer emits standard HTML tags. The mdxComponents map swaps each one for a custom React component that handles the design system, accessibility, and the last few Obsidian quirks.

Images

function MdxImage({ src, alt }: { src?: string; alt?: string }) {
  if (!src) return null;
  const isExternal = /^https?:\/\//.test(src) || src.startsWith("//");
  if (isExternal) return <img src={src} alt={alt ?? ""} />;
  return <Image src={src} alt={alt ?? ""} width={1200} height={630} />;
}

This is why stage one always emits a leading slash: it guarantees the path goes through the next/image branch and gets optimized, sized, and lazy-loaded. External images fall back to a plain <img> so we don't have to whitelist domains in next.config.js.

Admonitions — the trickiest one

Obsidian's callout syntax is just a blockquote whose first line happens to be [!TYPE] Optional title:

> [!WARNING] Heads up
> This will overwrite your changes.

That's valid Markdown — MDX parses it into a perfectly normal <blockquote><p>...</p></blockquote>. So unlike the wiki-link case, we can't fix this at the string level without effectively re-implementing a Markdown parser to find blockquote boundaries.

Instead, the override does the detection at render time:

  1. Walk the React children tree with a getNodeText helper that recurses into elements and joins text nodes. Run the [!TYPE] title regex against the result.
  2. If matched — extract kind (lowercased) and title, then call stripAdmonitionMarker(children) to recursively walk the tree and remove the [!TYPE] title text from the first text node only. Everything after stays intact: paragraphs, lists, links, code, even nested admonitions.
  3. Render an <Admonition> — look up the styling (border, background, icon) in an ADMONITION_STYLES map keyed by kind, drop the cleaned children inside.

Eight kinds are supported: NOTE, TIP, INFO, WARNING, IMPORTANT, CAUTION, DANGER, SUCCESS. Here are a few in action:


Bugs we hit along the way

Three issues bit us in production. Each one is worth knowing about because the symptom and the cause are far apart.

Bug 1 — Each child in a list should have a unique key

When stripAdmonitionMarker cleaned the blockquote body and handed an array of children to React.cloneElement, React 19 logged a key warning for every item.

The fix is one line:

const finalChildren = Array.isArray(cleanedChildren)
  ? React.Children.toArray(cleanedChildren)
  : cleanedChildren;
return React.cloneElement(node, undefined, finalChildren);

React.Children.toArray synthesizes stable, position-based keys, which is exactly what cloneElement wants when handed a list.

Bug 2 — Encountered a script tag while rendering

We wanted JSON-LD structured data for SEO. The obvious approach was a <script type="application/ld+json"> inside a manual <head> block in app/layout.tsx. React 19 immediately complained, because scripts injected as JSX children never execute.

Moving the script into <body> made the warning go away, but introduced…

Bug 3 — Hydration mismatch on the mobile menu button

The error pointed at a Radix <Sheet> trigger button:

Hydration failed because the server rendered HTML didn't match the client.
- aria-controls="radix-_R_4qlb_"
+ aria-controls="radix-_R_6qlb_"

The button was fine. The cause was the JSON-LD <script> we'd just moved into <body>. React 19 hoists script resources during hydration, which shifts the Fiber index that Radix's useId() uses, so the server and client end up disagreeing on the auto-generated ID.

The fix was to render the JSON-LD via next/script with strategy="afterInteractive". next/script injects the tag outside React's reconciliation tree, so it can't shift sibling Fiber indices:

<Script
  id="ld-person"
  type="application/ld+json"
  strategy="afterInteractive"
  dangerouslySetInnerHTML={{ __html: JSON.stringify(personSchema) }}
/>

Three bugs, three completely different files, all caused by the same chain of consequences from one inline <script> tag. That's React 19 in 2026.


What the engine doesn't do (yet)

Honest scope is the best feature.

FeatureStatusNotes
Draftsdraft: true in frontmatter excludes the post from build
Tags / categoriesListed on cards; tag pages would be a 20-line addition
RSSA route.ts over getAllPosts() is a one-afternoon job
SearchHonestly fine without — there are <50 posts
CommentsWould need a backend; not worth the complexity
Math (KaTeX)Add remark-math + rehype-katex if a post needs it
Mermaid / diagramsSame — plugin-shaped problem, not engine-shaped
Wiki-link cross-refs[[other-post]] rewrites would go in stage one

The engine is intentionally a core with extension points, not a framework. If a post needs math, that one post can opt into the plugin; everything else stays light.


What I'd tell someone building this from scratch

A few things I'd get right on the first try if I did it again:

  1. Pre-process at the string level for syntax-incompatible features (wiki-links, embedded blocks). Don't try to patch the AST after the fact.
  2. Override at the component level for compatible features (admonitions, custom links, callouts). Don't try to invent grammar.
  3. Highlight code at build time, always. The page-weight and FCP wins compound across every post, every visit.
  4. Use next/script for anything that emits a real <script> tag in modern React. useId() collisions are not worth the debugging time.
  5. Keep the contract for authors as small as possible. Frontmatter + filename = published post. Anything more is friction.

The whole engine — preprocessor, plugins, components, and styles — is under 800 lines. Posts are pure Markdown. The build is next build. That's the entire system.


Try it yourself

Every feature in this post — the wiki-link image rewriting, the eight admonition kinds, dual-theme syntax highlighting with line numbers and word highlights, the copy buttons, the GFM tables — works the same way for any post you drop into /content/blog. There's no hidden second system, no per-post configuration, and no client-side JavaScript involved in rendering anything you've read above.

That was the goal: make the file the source of truth, and let the engine disappear.