I really tried to build a YAML Parser

I wanted to build something different - a parser to understand and eventually convert YAML to JSON. Of course, I didn’t know what was involved. With 2 weeks of focused work, I surely thought I could do this...

I settled on this approach

  • Research an algorithm to effectively parse YAML line by line
  • Implement the simplest version and publish to npm
  • Read the YAML spec and add support for more complex language features

Implementation

Finally, a great excuse to try Bun. It promised everything with almost no setup. I hacked together a basic algorithm to handle line-by-line parsing. It took a little more time, but I got indentation working properly too. Here’s what the core function looked like

export function parseYAML(lines: string[], depth: number = 0) {
    let result: Record<string, any> = {}
    while (lines.length) {
        const currentLine = lines.shift();
        const currentLineDepth = currentDepth(currentLine);
        if (currentLineDepth < depth) {
            lines.unshift(currentLine);
            break
        } else {
            let { key, value } = parseLine(currentLine)
            if (!value?.trim()) {
                value = parseYAML(lines, currentLineDepth)
            }

            result[key.trim()] = value;
        }
    }

    return result;
}

And the function to calculate and return the current line depth

export function currentDepth(line: string) {
    const withoutSpace = line.trimStart();
    return line.length - withoutSpace.length
}

The function to parse a line and either return the key,value or just key where no value exists

export function parseLine(line: string): Record<any, any> {
    const parts = line.split(":")
    if (parts.length === 2)
        return { key: parts[0], value: parts[1].trim() }

    return { key: parts[0] }
}

Then the main entry file looked like this…

export async function parse(filePath: string) {
    const file = Bun.file(filePath);
    const content = await file.text();
    const lines = content.split("\n");
    const json = parseYML(lines);
    const fileName = file.name?.split('.yml')[0];
    Bun.write(`${fileName}.json`, JSON.stringify(json, null, 2))
}

I was a little bit proud of myself, published the package and went to bed getting ready for the next day.

Next Day

I started reading the YAML spec text by text. The more I read, the more confused I became. As someone who’s never worked with YAML extensively, I thought it was just keys, value & indentation. I was so wrong on many levels.

How on earth was I going to reason about this?

? [ New York Yankees,
    Atlanta Braves ]
: [ 2001-07-02, 2001-08-12,
    2001-08-14 ]

Or this

--- >
 Sammy Sosa completed another
 fine season with great stats.

   63 Home Runs
   0.288 Batting Average

 What a year!

It became so evident that there were a lot of edge cases and if I wanted to build something useful, I would have to think deeply about it. I stumbled on a guy building something similar in rust and had a youtube playlist of 47 videos. At this point, I just knew the idea wouldn’t pass. I was just looking for a side project I could build within a week or two, and this was not it.

Giving up

I have to spend a lot of time researching about parsers to build something remotely useful. I’m currently job hunting, and have a lot of low hanging fruits. Besides, it’s not a problem for most people since there’re about 1 and 1/2 libraries that do it perfectly right now.

Demo Image

But I had become too invested in the idea that I didn’t want to just leave. So I took a few hours to build a web app to convert from YAML to JSON using one of the popular libraries, js-yaml. My long term plan is to study more about parsers and try to replace the library layer some day. See a live demo here

It’s also open source and you can find the code here.