Error Message Really Matters

Error messages can really decide how fast you can find the problem in your code.

This is what I learned after spending 2 days on an iOS push notification issue.

The iOS push notification issue

  1. We use Rpush1 to send notifications to mobile devices (Android/iOS).
  2. I did a huge refactoring in our codebase to wrap the Rpush API in our own class. (In the meantime, add more tests to cover notification push logic)
  3. Even though all the tests were passing, the Rpush::Apns::Notification objects were persisted correctly, the iOS Notifications could not be sent correctly. We only got the following error from Rpush:

    Lost connection to gateway.push.apple.com:2195 (Errno::EPIPE)
    
  4. But the Android notifications could be sent successfully.

Try to fix the wrong problem (App configurations)

  • I thought this must be due to some wrong configurations in our Rpush::Apns::App.
  • And I also found this issue2, there were some solutions like "recreating the certificate", or "use the correct environment value".
  • So I tried them all, but none of them really works, I kept getting the same error.

The real issue (Invalid Device Token)

  • One thing I noticed was that we could still send notifications within the Google Firebase3 console.
    • So I started trying to see if there are any differences between the Rpush app configuration and the firebase configuration.
    • Then I found that in Firebase, the configuration was using APNs Authentication Key instead of APNs Certificates as we were using in the Rpush configuration.

      5a39d1103d335.jpg

  • Another thing I also noticed was that newly created device tokens could not pass the Rpush::Apns::Notification's device_token validation.
    • The old tokens were short hashes like da389196
    • The invalid tokens were much longer hashes like ekKXarhmDOA:APA91bH9fsR-Dj5s0PQkwVZ-YwQtgDnMgaT8VwVVf5nFZ5KskGU5QLeVMOHXEO0wMzHfp1ifGAvNjipuvw5M6tpQO3e2fdrNw1rW4C9IPCoRGyFetwOrkuDdaDb0ftNoZmcrFyYJwJhC
    • The validation were overwritten by another developer so I thought it was the old version Rpush using the wrong validation rule.
  • Finally we checked with the iOS developer (we are in different timezones). Turns out he has updated the iOS client logic to use Firebase as the push notification backend (without notifying us about this).
    1. Firebase uses APNs Authentication Key to send notifications
    2. Firebase tokens are long hashes like ekKXarhmDOA:APA91bH9fsR-Dj5s0PQkwVZ-YwQtgDnMgaT8VwVVf5nFZ5KskGU5QLeVMOHXEO0wMzHfp1ifGAvNjipuvw5M6tpQO3e2fdrNw1rW4C9IPCoRGyFetwOrkuDdaDb0ftNoZmcrFyYJwJhC
  • iOS developer rolled back the changes and the notifications can be pushed as normal

Good error message really matters

In retrospect, I think if I can get a more meaningful error message from either Rpush or APNs, I can definitely fix this issue much quicker, instead of spending two days on this.

What I'm expecting here is an error message that can directly lead me to the invalid token issue, but both Rpush and APNs have failed to provided me this kind of error.

Better error message from Rpush (Library)

There's only one possible error for this case from Rpush4:

[Connection 0] Lost connection to gateway.push.apple.com:2195 (#{error.class.name}, #{error.message}), reconnecting...

This error doesn't provide anything helpful that can lead me to the problem I need to solve.

I still need to Google it. And in this case, Google leads me to the wrong solution (Invalid certificate2).

I think there are two possible improvements here:

  1. A small improvement could be adding the possible cause to the original error message (which is what the newest Rpush is doing).
  2. A better improvement could be parsing the error responses from APNs and provide more meaningful error messages for developers.

Better error message from APNs (API responses)

I didn't check the error message from APNs in the debugging process. But this reminds me of the error message from AWS5.

It's not easy to provide meaningful error messages in the API responses, because

  1. It's already hard to implement the right logic.
  2. If we provide too much info (like returning "This user doesn't exist" for invalid username but "The password is wrong" for wrong password), it might be used to attack our system.

But I think it's still possible to do that:

  1. Return "Invalid certificate or device_token", like what most sites are doing now (returning "invalid username or password") for their user authentication
  2. Do validations on device_token and return the error.

Learn from programming language error messages

More and more attentions are paying to good error messages.

Languages like Elixir and Elm6 are providing pretty good error messages.

I heard people saying that they can implement some Elm functions just by following the error message, without looking for documentations.

I think error message definitely serves an important role in the future of programming (languages/libraries/APIs).

If you are interested, you can also check out this talk7 by Ben Orenstein.