As some of you may have noticed, we had a few problems with our import system last week. After several hours of investigation and a fix gratefully being applied, we wanted to share some of the details of what happened.
The Short Version
As of late last week, we noticed some problems when people were visiting the Subscriber Import page. Many times, things would work fine; sometimes, you might be shown a server error. That’s pretty confusing (and frustrating!) for anyone; the worst part was, though we could reproduce it, we couldn’t make any sense of it.
After a lot of investigation, we were able to track down the problem to some new code we added around that time. Our intention was to set the locale, useful for those with languages that may format dates a bit differently than English, but it turns out our cache system (which helps us serve pages as fast as we do) had some problems working with different locales.
You might be thinking that doesn’t make any sense. We think you’re right, and we were pretty confused; this was not the most straightforward debugging session, to say the least. But our testing confirms that code was the culprit.
If you’re worried about the locale not being set, don’t fret; we have some plans to reintroduce that feature, but in another form. From here on, your visits to the import page should be error-free.
The Long and Technical Version
As a programmer, I like bugs that make sense: the page isn’t loading, and there’s a typo? That would do it. Oh, we forgot to save this color change to the database; that’s why it’s not reflected when I refresh the page.
I don’t like bugs that make no sense, so the bug here was pretty frustrating not only for me but everyone in the office. It began happening in a third-party library for connecting to Google Contacts: one of the interfaces there (apiIO) was not defined. What an odd thing, particularly for some code we’d been using for a while; the bug nevertheless corresponded with a new feature we released, and we were pretty wary that something with that feature wasn’t interacting well with the rest of our software.
Not soon after we began looking, we had disabled APC, which is the cache system we use for PHP, and the error magically disappeared. That’s stranger, and told us right away that whatever we were looking for wasn’t going to make a lot of sense. (We were right about that in the end.) It also told us that we needed to be a bit less conventional in our thought process.
The interface in question is, of course, present in one of the code files of Google’s library. We thought perhaps the way those files are included doesn’t work well with APC, so we stopped what Google was doing (setting the system’s include path) and changed the library to include files using an absolute path. It seemed like it was working! — but then we saw server errors again. Back to the drawing board.
The apiIO interface is really the only one in the library, and we don’t tend to use that particular feature of PHP ourselves. What’s an interface but a class with empty methods, right? So we changed apiIO to actually be a class. It seemed like it was working! — but no.
Sigh. Well, APC is turned off for now, and in the meantime another error popped up in which some methods in a class our sending engine uses seemed not to exist, so we turned our attention to that (since sending is important). (It should be pointed out that this problem was only affecting one of our users — everyone else was able to send just fine.) Turns out the class has all the methods it should; if you run the PHP function to grab all of the class methods, they’re all there. But it turns out a few of them are missing from the object.
Let’s make it weirder. The methods that were missing all had a capital letter ‘I’. They should be there, but they weren’t, and by this time we were thinking this was more like magic than programming. We knew by the time the error happens that the methods are gone; we also knew, through testing, that when we start to send with this example, the methods are there. Sometime in the middle, they go away — which is weirder still.
So we did divide and conquer; pick some code that runs half-way through the error case and the beginning. Are the methods there or not? On and on we went, brute-force, until we tracked it down to — PHP’s setlocale function. Huh?
This particular user had been using our Turkish language support. Setting the locale for that turns out to have a longstanding bug in PHP, one in which our version was still susceptible. It seems that setting the locale has a somewhat illogical but nevertheless destructive impact on class methods (in particular, with the capital letter ‘I’). When we stopped setting the locale, sending was fixed.
And then we thought, hey — that error with Google’s library when APC was turned on? Didn’t that interface have a capital letter ‘I’ in it? It did! But the people getting the error weren’t using Turkish; they were, in many cases, using only English. Still, we had a theory, and it sort of made twisted sense. So we turned off setting the locale for everyone, and turned on APC, and…
It worked. The import page; everything worked. It shouldn’t have mattered, and it still doesn’t make any sense, but setting the locale made methods, classes and interfaces disappear, in particular those with a capital letter ‘I’. Thus ended one of the most bizarre debugging sessions our developers have ever had.