We need streaming because we need to backup data, we need to monitor or detect abnormalities of our system. Sometimes we have to create new streams from the original streams. Or according to our business needs, we drip feed events into columnar or NoSQL databases. There are a variety of schema technologies known as data serialization systems. We have already mentioned some popular ones with our previous tutorials. Today we will talk about Avro, one of the most mature and experienced serialization systems.
Apache Avro :
- It is developed as part of the Apache Hadoop project
- It’s schema language is, json based.
- Avro (IDL) Interface Definition Language syntax is like C
- Avro has 2 representations, a human-readable JSON encoding and binary encoding format
- Avro uses a .avsc file known as Avro Schema
- It has a poor compatibility for programming languages, when compared with other data serialization systems
Confluent is a data streaming platform based on Apache Kafka. It can be used as publish-subscribe-based messaging, storage, and processing of the streams. For our tutorial, we are going to need a confluent platform to make our spring boot application work. Make sure that you have docker installed on your machine. We will use a docker-compose file, you can download the docker-compose file from Github. More information is available at Confluent
According to avro.apache.org Avro Schema Declaration primitive types are listed as:
null: no value
boolean: a binary value
int: 32-bit signed integer
long: 64-bit signed integer
float: single precision (32-bit)
double: double precision (64-bit)
bytes: sequence of 8-bit unsigned bytes
string: unicode character sequence
First, we will create a student.avsc file. The package namespace here is important. We will be using the generated student class within our applications according to the namespace declared here.
We will now use the command docker-compose up from the command prompt to make our containers up and running. Please wait and make sure that all the containers are running properly. Some of them can fail at first, so may have to restart those containers.
We will be creating two spring-boot applications. One of them is the Producer application and the other one will be the Consumer application. For the Producer application, we will be creating a directory myavro and we will place student.avsc into that folder.
We need avro-maven-plugin to generate the java classes in order to tell maven, the place of our student.avsc file.
So please add the necessary dependencies to pom.xml. Run maven-clean and compile sequentially. This will generate the Student class as shown below.
Next, we will create a KafkaProducerService to send our messages to kafka. This will send the message to our topic with a key and value. For a better understanding, we hardcoded the “students” value but you can also assign the value from the application.properties file.
We will define some properties to the application.properties file. The topic name will be “students” and the group id will be “groupid”. The producer key&value serializers are also defined. The dependencies are defined at our pom.xml.
The ProducerController is created for generating some random values from the url address “http://localhost:9393/mystudent/init”
Now we are ready to run our producer application. When we click the url(localhost:9393/mystudent/init) we send the message. We can track the message from the Confluent Control Center. The Control Center is used to manage and monitor Apache Kafka. You can open the Control Center from http://localhost:9021/clusters.
You can check the schema file form the “students” schema menu. You can delete the schema or change the compatibility modes from here.
Now we created the producer application successfully. And we are ready for the Consumer part. For Consumer application we will place the myavro directory to src\main. We will add the avro-maven-plugin and necessary dependencies to our pom.xml. We will generate our Student class, just like we did in Producer application. Here to get the properties, we will use the application.yaml file. As this application will deserialize our data, we will use the key&value deserializer classes. Also, our application will run at port 9392.
Next we will define KafkaConsumerService class. This class will consume the messages. With the KafkaListener annotation, we pass the topic and groupid parameters.
When we run our application and send several messages from our producer rest application we will get the following output.
Thanks for reading. The source code is available at GitHub.