I wanted to build something different - a parser to understand and eventually convert YAML to JSON. Of course, I didn’t know what was involved. With 2 weeks of focused work, I surely thought I could do this...
I settled on this approach
- Research an algorithm to effectively parse YAML line by line
- Implement the simplest version and publish to
npm
- Read the YAML spec and add support for more complex language features
Implementation
Finally, a great excuse to try Bun. It promised everything with almost no setup. I hacked together a basic algorithm to handle line-by-line parsing. It took a little more time, but I got indentation working properly too. Here’s what the core function looked like
export function parseYAML(lines: string[], depth: number = 0) {
let result: Record<string, any> = {}
while (lines.length) {
const currentLine = lines.shift();
const currentLineDepth = currentDepth(currentLine);
if (currentLineDepth < depth) {
lines.unshift(currentLine);
break
} else {
let { key, value } = parseLine(currentLine)
if (!value?.trim()) {
value = parseYAML(lines, currentLineDepth)
}
result[key.trim()] = value;
}
}
return result;
}
And the function to calculate and return the current line depth
export function currentDepth(line: string) {
const withoutSpace = line.trimStart();
return line.length - withoutSpace.length
}
The function to parse a line and either return the key,value or just key where no value exists
export function parseLine(line: string): Record<any, any> {
const parts = line.split(":")
if (parts.length === 2)
return { key: parts[0], value: parts[1].trim() }
return { key: parts[0] }
}
Then the main entry file looked like this…
export async function parse(filePath: string) {
const file = Bun.file(filePath);
const content = await file.text();
const lines = content.split("\n");
const json = parseYML(lines);
const fileName = file.name?.split('.yml')[0];
Bun.write(`${fileName}.json`, JSON.stringify(json, null, 2))
}
I was a little bit proud of myself, published the package and went to bed getting ready for the next day.
Next Day
I started reading the YAML spec text by text. The more I read, the more confused I became. As someone who’s never worked with YAML extensively, I thought it was just keys, value & indentation. I was so wrong on many levels.
How on earth was I going to reason about this?
? [ New York Yankees,
Atlanta Braves ]
: [ 2001-07-02, 2001-08-12,
2001-08-14 ]
Or this
--- >
Sammy Sosa completed another
fine season with great stats.
63 Home Runs
0.288 Batting Average
What a year!
It became so evident that there were a lot of edge cases and if I wanted to build something useful, I would have to think deeply about it. I stumbled on a guy building something similar in rust and had a youtube playlist of 47 videos. At this point, I just knew the idea wouldn’t pass. I was just looking for a side project I could build within a week or two, and this was not it.
Giving up
I have to spend a lot of time researching about parsers to build something remotely useful. I’m currently job hunting, and have a lot of low hanging fruits. Besides, it’s not a problem for most people since there’re about 1 and 1/2 libraries that do it perfectly right now.
But I had become too invested in the idea that I didn’t want to just leave. So I took a few hours to build a web app to convert from YAML to JSON using one of the popular libraries, js-yaml. My long term plan is to study more about parsers and try to replace the library layer some day. See a live demo here
It’s also open source and you can find the code here.