Write the business logic
With the topic resources defined in the last step, it’s time to write a simple Kafka Streams topology to perform the business logic of this service.
The service will search each tweets text for occurrences of Twitter handles, e.g. @katyperry
.
For each handle found, it will produce a record mapping the Twitter handle to its number of occurrences.
For example, it a tweet contained the handle @katyperry
twice, then it would produce a record
with a key of @katyperry
and a value of 2
.
Define the stream topology
The aggregate template provided a shell TopologyBuilder
class.
Flesh out the class’s build
method to match what’s below:
ProTip: The example code deliberately names each step in the topology. This is good practice. Relying on default naming can result in topology evolution issues in the future. Internal store and topic names incorporate the step name. With default naming, the name of a step, and hence the store or topic, can change if steps are added or removed. This can lead to unintentional changes in internal topic names. If such a change was deployed, any unprocessed messages in the old topics would be skipped.
The above topology consumes TweetTextStream
we defined in the service’s descriptor,
transforms it in the extractHandles
method, and produces any output to the TweetHandleUsageStream
.
The Creek Kafka Streams extension provides type-safe access to the input and output topic metadata, serde, and Kafka cluster properties, allowing engineers and the code to focus on the business logic.
As a single input record can result in zero or more output records, depending on the occurrences of Twitter handles in the tweet text,
we use the flatMap
method to invoke the extractHandles
method.
The details of the extractHandles
method isn’t particularly important in the context of demonstrating Creek’s functionality.
A simple solution might look like this:
…and that’s the production code of the service complete!
ProTip: The Name
instance defined in the TopologyBuilder
doesn’t add much in this example, but as topologies
get more complex, getting broken down into multiple builder classes, it really comes into its own.
Check out its JavaDoc to see how it can be used to help avoid topology node name clashes.