So, I decided to write a BUFR decoder... Part 1
Kelton Halbert
Published: 2025-07-03
Contents
- Am I Insane?!
- What is BUFR, anyway?
- What would possess me to write a BUFR decoder from scratch??
- Concluding Thoughts and What’s Coming in Part 2
Am I Insane?!
Maybe. Probably, even. If you don’t already know what BUFR is, you will probably share that opinion after you learn more about it. However, if you take some time to hear me out, I think that maybe — just maybe — something really cool and useful can come from this. At the very least, you might learn something about a really fascinating and powerful programming language called Zig. You may also learn a thing or two about the BUFR format, though if you are looking for any answers as to why we are still using this format, that’s a better question for the World Meteorological Organization. In the absolute worst case scenario, you can point and laugh as you watch my descent into madness and get some entertainment out of it all.
What is BUFR, anyway?
If you haven’t had the displeasure of working with BUFR… I envy you! BUFR stands for the Binary Universal Form for the Representation of meterological data. I know, it really just rolls off the tongue. Brought to you by the group who has other greatest hits such as GRIB and GRIB2, it is intended to be used for storing and transporting meteorological data that are not gridded in nature. Like GRIB and GRIB2, data are described by separately maintained and defined tables. These tables are used to interpret and decode the compact binary data for use.
While my introduction to BUFR is far from subtle, it wasn’t chosen as an official meteorological data format just for the fun of being difficult. It does in fact have some positives:
- Binary data formats are compact and can be compressed, reducing dataset overhead and supporting space-efficient data storage.
- BUFR itself is considered “portable”. While not a problem the average programmer or meteorologist thinks about these days, there are no architectural dependencies on specific computer word length (i.e. 32-bit or 64-bit architectures) or byte order (big-endian vs little-endian).
- Table coded formats are considered extendable, since only the tables need to be updated and not the code logic itself.
- If a new observation type is created, just update the tables rather than the code…
I’m not over here trying to be a BUFR apologist, but these are the documented benefits the authors of the BUFR standard had in mind. However, what was considered one of its strengths is actually its greatest weakness:
- The BUFR tables are necessary to decode the data.
- Sometimes knowing the definitive tables used is challenging, particularly with “local tables” defined outside the master table versions. If you’ve worked with NCEP GRIB2 data in Python, you’ve likely encountered this when fields are decoded as ‘unknown’. This can lead to ambiguity in what is actually encoded in the BUFR data and a violation of the U in BUFR.
- BUFR software packages can be challenging to learn and use, and nobody with sanity has time to read WMO Manual #306. Even with good software, processing a dataset can still be frictious and difficult for a first-timer.
- To decode data, you have to ship and load the BUFR tables at runtime. This is extremely restrictive if you want to do something like, say, decode BUFR data in a web browser using WebAssembly (more on that later…).
What would possess me to write a BUFR decoder from scratch??
It should be no surprise to those that know me (or if you’ve read the other blog posts) that I am a bit of a Skew-T/Hodograph sounding analysis guy. Since joining SPC, I have been actively working to modernize the post-processing and data visualization software used at the Storm Prediction Center:
- Authored and released SHARPlib - a Python-wrapped C++ library for performant analysis of high-resolution observed and gridded sounding data.
- Created an updated, 3-km grid spacing version of the SPC Mesoscale Analysis that has evaluated quite well in the 2024 and 2025 Hazardous Weather Testbed experiments. This uses SHARPlib for high performance parallel post-processing of convective fields to keep latency low.
- Created a prototype of NSHARP — the original sounding analysis program from John Hart that SHARPpy was based off of — that cross compiles to MacOS, Windows, Linux, and web browsers through WebAssembly. While still in early development, this is intended to be what we move to in the future and in a cloud-based weather service. Something pretty neat about it is that it can process and display full-resolution, 1-second sample-rate weather balloon data.
Recently, we (SPC) were approached by the fire-weather/IMET community about questions regarding the support for SHARPpy going forward, since their subject-matter expert was no longer present and some of SHARPpy’s functionality had been degrading (or outright unusable). Since SHARPpy is not an officially supported project by SPC, and my other co-authors on that project have other roles and duties (Tim works for SPC with me though!), fixing their specific fork of SHARPpy was not feasible. However, after informing them of a fully-fledged version of NSHARP that could run in a web browser, it was agreed that this would be an excellent direction to go… especially since they were interested in displaying full-resulution sounding data launched by IMETS in the field.
The one catch… full-resolution IMET sounding data are encoded as BUFR data.
Beyond just the IMET use case, this was the straw that broke the camel’s back on taking this initiative. You see, high-resolution NWS radiosondes are also encoded and shipped over NOAAPORT as BUFR, and NCEP encodes model point forecast soundings as BUFR — provided over NOAAPORT and available Amazon Web Services and the Google Cloud Platform. The need to handle BUFR decoding had been staring me in the face for quite some time, and the next version of NSHARP would be dependent on the ability to read and decode BUFR data natively.
There are perfectly good BUFR decoders that already exist. NCEP has an open-source library written largely in FORTRAN that is used operationally within the NWS, and even includes Python wrappers. There is also the pure-Python library called pybufrkit that isn’t too terribly difficult to use. I’m pretty sure that ECMWF also has a library of their own. However, neither of these libraries were going to be sufficient for the use case of NSHARP, particularly in a web browser. I really wasn’t interested in having FORTRAN as a dependency, and with the way the Web Assembly runtime works, loading runtime lookup tables for BUFR decoding was going to be hacky in the best case scenario and nearly impossible in the worst case. If I wanted to be able to read and decode BUFR data straight from the cloud providers as a byte stream, via NOAAPORT/LDM, or have a user upload a file to the browser, I was going to have to come up with something novel.
Sometimes in your life and career, circumstances find a way to provide you with something truly challenging. You can walk away from it and say “that’s too hard, no thank you”, or you can face it down and tackle it one step at a time until the problem has been solved. Up until this point, I had never worked with a low-level, binary data format. I certainly didn’t know anything about BUFR other than it was a challenge to work with. However, there was something… enticing… about the challenge posed by this task. Whether influenced by insanity, stupidity, or something else, I decided it was time to write a BUFR decoder. I cracked open the WMO Manual on Codes, Volume I.2, Part B - Binary Codes and got to learning.
Concluding Thoughts and What’s Coming in Part 2
This was quite a bit of setup and motivation, and there is a lot of stuff I’m really excited to talk about… things like:
- What languade I chose and why (spoiler, it’s Zig)
- The internal structure and format of BUFR, how to decode it, and how Zig is particularly well suited for this task.
- Interesting solutions to the problems encountered when writing a BUFR decoder.
- My first-time experience with things like compile-time reflection.
- Probably more…
However, it felt best to split this into separate blog posts to prevent it from becoming overwhelming, especially since I would really like to get into some nitty-gritty technical details. This isn’t any sort of attempt to draw this out for content, I just genuinely believe the background and motivation is important for understanding the how and why of what comes next. I’m into my 4th week of writing this new BUFR decoder, and I am significantly further along than I had imagined when I started this project. In fact, the broader file decoding is done, and I am actively in the process of writing the portion that uses the tables to extract the encoded values!
I would totally understand if you thought I must have a few loose screws for taking on something like this, but I’ve genuinely learned a lot in the process and I believe it to have sharpened and developed new tools and skills. I am quite excited to share more and talk about those things very soon!
Thanks for joining me on my descent into madness… See you again soon for Part 2!