Event polling
Crunchy Bridge doesn't provide webhooks, but it does have a real-time event API that can be polled for changes on a cluster or team. This page describes how to best design a poll loop that runs efficiently, and avoids pitfalls like missed events.
The event API is used through the event list endpoint, which filters events using a cluster_id
or team_id
parameter.
curl -X GET -H "Authorization: Bearer cbkey_123" "https://api.crunchybridge.com/events?cluster_id=d56sve3hpjfatlxpxhe6gfkzpi&limit=10&order=desc"
Polling best practices
-
Provision an API key dedicated to the specific polling task that's named after the program's name. This enables easy key rotation if necessary (because nothing else is using it), and makes the key less likely to be deleted accidentally (because its well formed name tells people what it's for).
-
Use pagination to keep track of the last seen event so only new events are returned for subsequent fetches.
-
Poll the event API frequently, but avoid polling with hyper frequency. Once every 15 or 30 seconds is enough to get a good balance between seeing new events in a timely manner and being a social API consumer.
-
Request events with
delay=10s
to avoid missed events that may otherwise occur from out-of-order transaction commits. See stream delay for more information.
Stream delay
Consuming the event stream with a short delay is recommended because occasionally events may appear near the stream's end, but not exactly at its end. When combined with using the last seen event as a cursor and unlucky timing on a poll loop, this might cause a consumer to miss an event.
Out-of-order events happen for two reasons:
-
Like our customers, Crunchy Bridge runs on Postgres. Each API request runs in a transaction in which inserted records aren't visible to other transactions until their own transaction commits. A transaction may generate an event earlier than another transaction, but commit later, thereby only revealing its event after other events that were created later, but committed sooner.
For example,
tx1
generatese1
(event 1), thentx2
opens and generatese2
.tx2
commits first, makinge2
available in the event API. A consumer makes a fetch and seese2
.tx1
commits ande1
becomes visible, but too late because the consumer is already usinge2
as its cursor. -
Event IDs are ULIDs (similar to a V7 UUID), which encode the millisecond at which they were generated along a random component of 80 bits. Two IDs generated in exactly the same millisecond may be out of order compared to the precise moment they were generated as the last ID gets a smaller random component by luck of the random number generator.
To avoid missing events in a poll loop, it's recommended that consumers use a slight delay to give all in-flight transactions a chance to commit. The overwhelming number of transactions in Bridge's API take less than a second, but to protect against outliers, it's better to use a larger delay like delay=10s
. The maximum duration of a Bridge API request is 30 seconds, so consumers that are willing to tolerate some lag in return for the strongest guarantee that no events will be missed can use delay=30s
.
High level program flow
A language agnostic flow for how an event polling loop should work:
-
Do an initial fetch to get the latest events (use descending order like
order=desc
). Process them if desired. If you don't need to process old events, uselimit=1
to fetch only the latest one. -
Get the latest event's ID for use as a cursor. Because the page was fetched in descending order, the latest event is in the first array position.
-
In a loop:
-
Fetch events using the last event ID by using it as a cursor as
cursor=<id>
. This time the fetch should be in ascending order usingorder=asc
or omitting theorder
parameter. Process any events that are returned. -
Get the latest event's ID for use as a cursor on the next loop iteration.
-
Check the page's
has_more
property to see if the page hit its size limit and there are note events to fetch.-
If true (quite unlikely given that Bridge event volumes tend to be quite low), repeat the loop immediately.
-
If false, sleep a reasonable amount of time (e.g. 15 or 30 seconds) and continue the loop.
-
-
Listing events uses a GET
endpoint, and all parameters mentioned above should be query parameters (as opposed to form-encoded or JSON).
Example program in Ruby
Here's a small Ruby program that implements the routine described above:
require "json"
require "net/http"
require "uri"
# Checks that an HTTP response has a status code indicating success and parses
# its JSON body to a hash.
def check_and_parse(resp)
status = resp.code.to_i
if status < 200 || status >= 400
raise "got non-success response code #{status}: #{resp.body}"
end
JSON.parse(resp.body)
end
api_key = ENV["CRUNCHY_API_KEY"] || abort("need CRUNCHY_API_KEY")
api_url = ENV["CRUNCHY_API_URL"] || abort("need CRUNCHY_API_URL")
headers = {
"Authorization" => "Bearer #{api_key}"
}
parsed_api_url = URI.parse(api_url)
client = Net::HTTP.new(parsed_api_url.host, parsed_api_url.port)
client.use_ssl = true
cluster_or_team_id = case
when ENV["CLUSTER_ID"] then "cluster_id=#{ENV["CLUSTER_ID"]}"
when ENV["TEAM_ID"] then "team_id=#{ENV["TEAM_ID"]}"
end
# Do an initial fetch with `order=desc` to get the latest event. We'll use that
# as a reference to check for new events in the loop below.
data = check_and_parse(client.request(
Net::HTTP::Get.new("/events?#{cluster_or_team_id}&delay=10s&limit=1&order=desc", headers)
))
# ID of the first element in the `events` array
last_event_id = data.dig("events", 0, "id")
puts "last event ID: #{last_event_id}"
loop do
# If `last_event_id` is `nil`, that gets encoded as an empty string and the
# API will interpret that as no cursor.
data = check_and_parse(client.request(
Net::HTTP::Get.new("/events?#{cluster_or_team_id}&cursor=#{last_event_id}&delay=10s", headers)
))
if data["events"].empty?
puts "no new events"
else
data["events"].each do |event|
puts "got new event of kind: #{event["kind"]}"
end
last_event_id = data["events"].last["id"]
# If there were more new events than what could fit on the page (unlikely,
# but possible), repeat the loop immediately.
next if data["has_more"]
end
# Sleep between fetches so the API isn't being hammered too hard.
sleep(15)
end
Sample run:
$ TEAM_ID=qvcw4hylovgyzbwzp53bmmlhga ruby watch_events.rb
last event ID:
no new events
got new event of kind: cluster.created
no new events
got new event of kind: cluster.destroyed