Batch processing in modern Java applications with Spring Batch.

In the world of software development, Batch processing has been one of the challenging areas to implement in the early stages. But these days there are plenty of solutions available out of the box in the frameworks and platforms to do batch processing. In this article, I will share my experience with one such tool, Spring Batch. This will enable you to do batch processing by configuring this with your existing Spring Boot applications.

Before we jump into the Spring Batch let me brief you about batch processing. the name might make you fear a lot about it in case if you are new to this. It’s nothing but processing the transactions (data) as small chunks or groups without any manual interaction from the user of the application. You may ask why you want to do that. The answer is because it has a lot of benefits in terms of performance and the efficiency of the application as you deal with a large dataset.

Now let’s understand how it’s going to work and how we are going to implement this. First of all, I am going to pick an example to start with. I’ll take a large CSV dataset from Kaggle and write the data to the in-memory database of a Spring Boot application.

Here we there will be a user task that has the steps configured already to do execute the steps. In this particular use case, we will be only using one step. and it will have three main components those are reader, processor, and writer. I believe these components are self-descriptive enough with those names. But we will explore each in detail.

Now let’s try to implement this simple use case together. I’ll be attaching few code snippets and explaining step by step, but feel free to clone the GitHub repository and play with it from the following URL: https://github.com/vithushanms/spring_batch.git

The dataset being used in this example can be downloaded from here: https://github.com/vithushanms/spring_batch/blob/main/demo-batch-processing/src/main/resources/books.csv

As a first step, you should create the spring boot application with the following dependencies.

Spring Batch I/O
Spring Data
JPA
HSQLDB

After you have created the project, move the CSV which you have downloaded into the resource folder.

Now you have to create the data models for the Input data and the data you are going to write to the database.

So the above BookInput model describes the exact CSV data as to how it is represented. And the below one explains how we are going to write it to the database

Also, I’ve used few annotations from JPA which are @Entity and @Id. These are used to automatically create the table and set the id when we run the spring boot application. Now let’s create the actual item processor which describes how one single BookInput will be transformed into a Book.

Here the above class BookProcessor implements the ItemProcessor where the process is already defined. So we are overriding it to transform the Data in our pipe. In simple words, this gives the definition for getting one single input record and converting it to one single book.

Now we have the input model, expected output model, and item processor with us. Now we need to configure the flow.

So, I have created a configuration class that is BatchConfiguration and used @EnableBatchProcessing from the framework which will help to avoid a lot of manual works such as setting up a memory database. Also, I have autowired a couple of factories here which will be used in the latter part. with that, I have another bean to get the instance of the item processor (BookProcessor) we have created above. Now let’s introduce a few more beans to the configuration.

reader() has been added to configure the reading data from the source. In this, I am building an instance of FlatItemReader and returning it. while building it I named it “bookItemreader”, added the source CSV from the resource path then adding all the columns in the dataset to be read. After that mapping it to a model to create the objects.

writer() has been added to configure the way the data is going to be written into DB. So, here I am building the JdbcBatchItemWritter and returning it. One thing you may ask how I’m inserting in the query without actually creating the table. Well, that’s where JPA comes into the picture. I have used @entity annotation when I was creating the book class. So it will automatically create the table in-memory database while the application is starting.

As a way of organizing everything that I’ve done so far. I am going to take the steps into the picture. These steps are basically instructions on how to build the entire pipe in Batch Processing. In this particular example, we just need only one step.

In Step1() I have added the number of records to be taken as a chunk. This means when the batch processing is happening it will take a particular number of records(20 in this example) to a group and process parallelly.

Now we need to prepare the logs to monitor the progress of batch processing.

Here I am extending the JobExecutionListnerSupport which already has the implementation and overriding in a way that I want to log. Also, I am just printing in the logs in the console rather than preparing proper logging as this is just a demo application.

Now let’s think about executing our configured steps. Let’s go back to BatchConfiguration.java and add the execution flow.

importUserJob() has been created to build the flow of our steps. Now in terms of setup, we have completed everything. But one more important task you need to do. You have to check whether your dataset contains any labels in the file. If so, please remove that row.

As you see in the above screenshot my dataset has the label row which needs to be removed. Now let’s remove and run the application!

As we could see in the screenshot the dataset is successfully imported to the in-memory database👏. I hope this will be helpful for you to get started with Spring Batch.

Thank you for reading!

Follow me if you are interested in programming and engineering topics. Please share my blogs with relevant people if you like them.

@linkedIn : msvithushan

Firebase Cloud Messaging — Part 1 — Send Messages From the Backend

Firebase Cloud Messaging is one of the easiest and reliable ways of sending messages/notifications from the backend to the front-end. So, I would like to share my experience with a cloud messaging service. The best part is, we all know that firebase has been providing a set of great services for free of charge. FCM is one of them. You could use this service for multiple scenarios in multiple ways. I am going to follow the Admin SDK option in our example which is the popular way of doing it. Also, in part — 01 of this topic, I will be focusing on sending the message from the back end, and in part — 02 of this topic I will show you how you can handle the received messages in the front-end. Admin SDK supports almost all the languages and technologies such as Java, ASP.NET Core, Node JS, etc. in this example I will be using the ASP.NET Core to demonstrate the back-end side of it. Feel free to use any languages that you prefer. In order to get started with the FCM admin SDK, you need to cre...

Shubham Jaiswal26 May 2021 at 10:30
That's a nice post and very descriptive blog and this was enough to clear my concept on spring-batch project since I have not worked on this yet.

Just few suggestions and it is upto you to use them since they will do nothing but will make your code look better :-
1.) You can use Lombok project to get rid of your setters and getters in your Book and BookInput model.
2.) Instead of using old looking println call in your JobCompletionNotificationListener.java class foreach(book -> System.out.pruntln(book));
You can use method reference
foreach(System.out::println);

Thanks
Shubham Jaiswal
Sai20 January 2022 at 18:24
Can I get some example on chunks
processing in a taskel

CS Reader

Search This Blog

Batch processing in modern Java applications with Spring Batch.

Comments

Post a Comment

Popular posts from this blog

FaaS in Action with AWS + Serverless Framework for Java Developers

Firebase Cloud Messaging — Part 1 — Send Messages From the Backend