Cloud is a fully managed service that makes it easy for developers to create, secure, publish and maintain APIs at any scale, anywhere in the world.
Whenever customers sign up for the service, we send them an email welcoming them onboard. We emit logs to enable our developers to troubleshoot if any error occurs in the email-sending functionality (network timeouts, etc.).
To improve the developer debugging experience, we added corresponding request-scoped values like request-ids to those logs. The way you typically do that in Go is by using context.Context .>Maintaining a consistent understanding of the context is essential as data flows through various stages. This helps to ensure accurate processing, error handling, logging, and other operational aspects.
While this information usually has a timeframe for the service to respond (after which it is cancelled), we often need to send this flow of data and context between different components or stages of a system before it is interrupted or prematurely terminated.
This blog looks deeper at propagating context without cancellation, why it’s important, and how we found a solution.
Into the context we go
Initially, our code looked like this:
func OnboardAccount(w http.ResponseWriter, r *http.Request) {
// Create a subsription, etc
// Send email to the customer.
go sendEmail("accountID", "subscriptionPlan")
}
func sendEmail(accountID, subscriptionPlanName string) {
ctx := context.Background()
ctx, cancel := context.WithTimeout(ctx, 120*time.Second)
defer cancel()
// Call a third-party email sending service.
err := thirdPartyMailService(ctx, accountID, subscriptionPlan)
if err != nil {
log.Error("Failed to send email.", err)
}
}
The OnboardAccount http handler is called when someone signs up as a customer on Cloud. It does several things – synchronously – like creating a subscription, creating an organisation, etc. and eventually sends a welcome email to the customer asynchronously.
As mentioned, we wanted to update the code so that sendEmail will take in a context.Context as a parameter. We would then pass in a http.Request.Context when calling sendEmail; this way, we could have richer logs emitted in the sendEmail function since they would now contain request-scoped values(request-ids, etc.) for each specific request.
We updated the code to:
func OnboardAccount(w http.ResponseWriter, r *http.Request) {
// Create a subscription, etc
// Send email to the customer.
go sendEmail(r.Context(), "accountID", "subscriptionPlan")
}
func sendEmail(ctx context.Context, accountID, subscriptionPlan string) {
ctx, cancel := context.WithTimeout(ctx, 120*time.Second)
defer cancel()
// Call a third-party email sending service.
err := thirdPartyMailService(ctx, accountID, subscriptionPlan)
if err != nil {
log.Error("Failed to send email.", err)
}
}
Soon after, we started seeing these errors in our services’ logs:
"RequestID=Kj24jR8LQha, Failed to send email. context canceled"
It was great to see that logs now contained the relevant request-scoped values like RequestID, but what’s up with that context cancelled error?
This happened for almost every call of sendEmail, which was surprising since we were using a substantial context duration when calling thirdPartyMailService. This value has served us very well in the past. We established that the third-party email SaaS systems were healthy and experienced no downtime.
After a cup of coffee and proper scrutiny of the new code, we zeroed in on this line:
go sendEmail(r.Context(), "accountID", "subscriptionPlan")
The problem was that the context, r.Context() is scoped to the lifetime of the http request. Thus, this context would get cancelled as soon as the OnboardAccount http handler returns. Since the sendEmail call is running in a goroutine, it is possible that it would run after OnboardAccount has returned (and by extension, the context would already be cancelled.)
Here is a small stand-alone reproducer of the issue:
func main() {
OnboardAccount()
}
func OnboardAccount() {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
go sendEmail(ctx)
fmt.Println("OnboardAccount called")
}
func sendEmail(ctx context.Context) {
fmt.Println("sendEmail called")
ctx, cancel := context.WithTimeout(ctx, 120*time.Second)
defer cancel()
fmt.Println("sendEmail, ctx.Err(): ", ctx.Err())
}
go run -race ./...
OnboardAccount called
sendEmail called
sendEmail, ctx.Err(): context canceled
We reverted the code change and began looking for permanent solutions to the problem. We need to propagate a context’s values without also propagating its cancellation.
Solution space
Surely, someone else in the Go community must have experienced similar issues? It turns out, this was not an uncommon problem; there was even an existing Go proposal suggesting to fix the issue in the standard library. At the time, that proposal had not yet been accepted, so we had to look for alternative solutions to the problem.
There are multiple[1][/span>][3] third-party packages that implement context.Context, which you can propagate without cancellation. Most of those were Go internal packages, which we could not import.
We thus created a small library in our application that offered this functionality and updated our code to utilise it:
import "our/pkg/xcontext"
func OnboardAccount(w http.ResponseWriter, r *http.Request) {
// Send email to the customer.
go sendEmail(
// Propagate context without cancellation.
xcontext.Detach(r.Context()),
"accountID",
"subscriptionPlan",
)
}
This fixed the issue.
And, there’s more good news; the aforementioned Go proposal has since been accepted and implemented, and it is available in Go v1.21 that was released in early August 2023. With the release, this is how you can use the newly added API:
import "context"
func OnboardAccount(w http.ResponseWriter, r *http.Request) {
// Send email to the customer.
go sendEmail(
// Propagate context without cancellation.
context.WithoutCancel(r.Context()),
...
)
}
Now, a question remains – what if someone forgets to use xcontext.Detach or context.WithoutCancel?
Wouldn’t it be better to have a linter for this scenario? I enquired on gophers-slack whether anyone knew of one; nothing seemed available.
Soon after, Damian Gryski added this linter to his awesome repository. Go Damian! I sent him this small bug fix, here.
So, there you have it. This repository is your current best bet for catching the issue of propagating without context. If you’re interested in checking out Cloud, you can start a free trial now – you’ll be ready to go in just a few minutes.