This post is from waaaaay back in the first COVID-19 lockdown, and I never got round to converting it from .org format and actually posting it but I’m starting to take a look at doing something with what I discovered (possibly making a Flipper Zero zigbee sniffing module) so I’m going to get on and publish this now.

The biggest problem you might have when pursuing your own unique brand of technological freedom is that you might end up going deeper on it than you thought you would, and you’ll learn a whole shitload but not actually produce anything.

So obviously in pursuit of having my own brand of smart-home sans the completely unnecessary yet somehow synonymous data-mining, I’ve finally managed to install a fork of Micropython on a smart switch and its only taken me FUCKING HOURS.

The impotent rage of a man who choses his own problems (like wanting to be able to control the lights from magic rituals).

In all seriousness though, in this one I got to do the following for the first time: - Debug with gdb over the J-Link - Built and flashed Micropython - Hacked code on to something not explicitly designed for it - Use vendor SVD files to get easy access to peripherals - Write a gdb python plugin - Used the hardfault registers to diagnose a problem - Patched memset

A different DIY

Ikea have their own range of smart-home kit, called Tradfri. It’s super cheap, it doesn’t appear to phone home unnecessarily according to more dilligent people, and most importantly, others have already hacked on it and shown that things like UART and SWD are easily accessed.

One of the projects that I quickly found and dove into was Trammell Hudson’s [[https://trmm.net/Ikea][Ikea hacking lightening talk]] along with his fork of Micropython for the SiliconLabs EFR32 and partially implemented Zigbee stack.

My plan here was to buy one of the super cheap smart switches along with a couple of the peripherals (a plug and a bulb) and to try to hack the switch so that it could be wired up over serial to a raspberry pi to send instructions over Zigbee to the peripherals, thus enabling me to write my own HTTP endpoint to have complete control over the devices without them having to go near the internet (or without me having to install and app).

I’m increasingly under the impression that it’s a pretty specious assumption to think it could all Just Work like this within any reasonable timescale, but as you’ll see I have at least learned a lot.

Debugging

Getting set up was pretty easy. Using Trammell’s photos and instructions, I quickly: - got everything set up and the firmware built - managed to get a pretty neat setup whereby I was using an FTDI232 to not only interact with the thing over serial, but also to sub out the coin cell and supply 3.3V to the MCU. - Hooked up my J-Link EDU Mini running a gdb server - Loaded up the file, set up a breakpoint for the HardFault_Handler function - Flashed the code and reset the MCU…

Aaaaand much to my surprise we pretty much instantly hit the HardFault_Handler breakpoint.

This lead to me googling and almost instantly finding [[https://interrupt.memfault.com/blog/cortex-m-fault-debug][this fantastic article on debugging hardfaults on Cortex-M devices]]. It teaches you all about what peripherals are available by default on a Cortex-M which you can use to find out why your code failed.

Python helpers

I spent quite some time trying to figure out why the svd files that give the hex addresses for the peripherals some slightly more friendly names and convenience functions didn’t seem to have the CFSR register that I needed to debug - turns out that most vendors don’t include the core ARM peripherals in their machine-readable definitions, and more infuriatingly ARM won’t provide these either.

To save myself some mental cycles I quickly threw together the following based on some simple examples online:

BFSR_ADDR = "0xE000ed29"
BFSR_LEN = "b"
BFSR_BITS = [
    "IBUSERR",
    "PRECISERR",
    "IMPRECISERR",
    "UNSTKERR",
    "STKERR",
    "LSPERR",
    "Reserved",
    "BFARVALID"
    ]

class CFSR(gdb.Command):
    def __init__(self):
        gdb.Command.__init__(self, "cfsr", gdb.COMMAND_DATA)

    def _get_bits(self, address, length):
        bits = [bit for bit
                in gdb.execute(
                    f"x/{length}t {address}", True, True
                ).split("\t")[1].strip()]
        bits.reverse()
        return bits

    def invoke(self, args, from_tty):
        cmd = str(args).split(" ")
        if cmd[0] == "bfsr":
            bits = self._get_bits(BFSR_ADDR, BFSR_LEN)
            for i in range(len(bits)):
                gdb.write(f"{BFSR_BITS[i]}: {bits[i]}\n")
                gdb.flush()
        return

if __name__ == "__main__":
    CFSR()

I’d already determined by figuring this stuff out manually that the BFSR register contained the information I needed, and I just needed a more intelligible way to be able to view the output without getting into the weeds of a full implementation:

(gdb) c
Continuing.

Breakpoint 3, HardFault_Handler () at main.c:236
236             uart_str("!!!!!\\r\\n");
(gdb) cfsr bfsr
IBUSERR: 0
PRECISERR: 1
IMPRECISERR: 0
UNSTKERR: 0
STKERR: 1
LSPERR: 0
Reserved: 0
BFARVALID: 1

Looking for the error

This roughly translates to: - PRECISERR :: We know exactly which instruction caused the hardfault - STKERR :: It was probably access to invalid memory - BFARVALID :: The BFAR peripheral contains the memory address that caused the problem

This was pretty helpful because checking an actual backtrace appeared to be of little use - it didn’t seem to make sense.

The address in BFAR was just below the stack at the top of ROM, so this was a pretty good smoking gun, and matched almost exactly the example given in the article.

By setting a watchpoint on memory just above the bottom of the stack, I was able to see the following:

...
#18 0x00015804 in memset (s=0x21000f7c, c=c@entry=0,
    n=n@entry=132) at ../../lib/libc/string0.c:86
#19 0x00015804 in memset (s=0x21000f7c, c=c@entry=0,
    n=n@entry=132) at ../../lib/libc/string0.c:86
#20 0x00015804 in memset (s=0x21000f7c, c=<optimized out>, n=132)

    at ../../lib/libc/string0.c:86
#21 0x0001e742 in RADIO_SeqInit ()
#22 0x0001f8bc in GENERIC_PHY_RACConfig ()
#23 0x0001f900 in GENERIC_PHY_Init ()
#24 0x0001d24c in RFHAL_Init ()
#25 0x0001bc8c in RAILCore_Init ()
#26 0x00014efe in radio_init () at radio.c:347
#27 radio_init () at radio.c:317
#28 0x000145c4 in main (argc=<optimized out>,

    argv=<optimized out>) at main.c:92

(gdb)

This is pretty frustrating because the problem appears to be right in the middle of a binary blob, and my reverse-engineering skills aren’t up to much, but notably in stepping through from that point, the memset frames on the stack just keep on iterating upward.

I emailed Trammell about this and he said that he’d occasionally seen similar but not gone to deep on the debugging (which, quite frankly, made me feel like I was doing a good job), but helpfully asked if I’d tried just commenting the radio code out.

Not just the radio

Trying this out it transpires that even on hitting the gc_init code (setting up the heap for use as garbage collected memory for Micropython, or so I assume) it shits the bed at memset again.

Reading the implementation and not being particularly au fait with scarce-resource optimizations I didn’t really know what to make of this:

void *memset(void *s, int c, size_t n) {
    if (c == 0 && ((uintptr_t)s & 3) == 0) {
        // aligned store of 0
        uint32_t *s32 = s;
        for (size_t i = n >> 2; i > 0; i--) {
            *s32++ = 0;
        }
        if (n & 2) {
            *((uint16_t*)s32) = 0;
            s32 = (uint32_t*)((uint16_t*)s32 + 1);
        }
        if (n & 1) {
            *((uint8_t*)s32) = 0;
        }
    } else {
        uint8_t *s2 = s;
        for (; n > 0; n--) {
            *s2++ = c;
        }
    }
    return s;
}

Honestly? My lockdown brain can’t comprehend that, and certainly not how it seemed to result in recursion behaviour. This was crossing a couple of areas of expertise and none of them dovetail with mine - however git log -p -- lib/libc/string0.c was very kind to me and let me know the current implementation was a performance optimization.

That meant there was an earlier implementation I could check out!

I copied the old code out from before that commit:

void *memset(void *s, int c, size_t n) {
    uint8_t *s2 = s;
    for (; n > 0; n--) {
        *s2++ = c;
    }
    return s;
}

YEAH BOIIIII I’m smart enough to understand that!

I uncommented the radio code in order to see if it all still caught fire and hey presto, not only did it not hardfault but I got a REPL over the serial console!

But WHY?

Honestly fucked if I know. the way that it would jump back in the memset it feels like it was perhaps accidentally rewriting the stack pointer over and over again…

I should probably debug this and see if it’s worth trying to fix properly in the original implementation, but I also should actually try to do the thing I originally set out to do.

I’ll update here when I decide which to do - I’m already taking time out from that to write it all down!