Skip to main content

Batch processing in modern Java applications with Spring Batch.

In the world of software development, Batch processing has been one of the challenging areas to implement in the early stages. But these days there are plenty of solutions available out of the box in the frameworks and platforms to do batch processing. In this article, I will share my experience with one such tool, Spring Batch. This will enable you to do batch processing by configuring this with your existing Spring Boot applications.

Before we jump into the Spring Batch let me brief you about batch processing. the name might make you fear a lot about it in case if you are new to this. It’s nothing but processing the transactions (data) as small chunks or groups without any manual interaction from the user of the application. You may ask why you want to do that. The answer is because it has a lot of benefits in terms of performance and the efficiency of the application as you deal with a large dataset. 

Now let’s understand how it’s going to work and how we are going to implement this. First of all, I am going to pick an example to start with. I’ll take a large CSV dataset from Kaggle and write the data to the in-memory database of a Spring Boot application.
Here we there will be a user task that has the steps configured already to do execute the steps. In this particular use case, we will be only using one step. and it will have three main components those are reader, processor, and writer. I believe these components are self-descriptive enough with those names. But we will explore each in detail. 

Now let’s try to implement this simple use case together. I’ll be attaching few code snippets and explaining step by step, but feel free to clone the GitHub repository and play with it from the following URL: https://github.com/vithushanms/spring_batch.git

The dataset being used in this example can be downloaded from here: https://github.com/vithushanms/spring_batch/blob/main/demo-batch-processing/src/main/resources/books.csv 

As a first step, you should create the spring boot application with the following dependencies.
  1. Spring Batch I/O
  2. Spring Data
  3. JPA
  4. HSQLDB
After you have created the project, move the CSV which you have downloaded into the resource folder.
Now you have to create the data models for the Input data and the data you are going to write to the database. 
So the above BookInput model describes the exact CSV data as to how it is represented. And the below one explains how we are going to write it to the database 
Also, I’ve used few annotations from JPA which are @Entity and @Id. These are used to automatically create the table and set the id when we run the spring boot application. Now let’s create the actual item processor which describes how one single BookInput will be transformed into a Book. 
Here the above class BookProcessor implements the ItemProcessor where the process is already defined. So we are overriding it to transform the Data in our pipe. In simple words, this gives the definition for getting one single input record and converting it to one single book. 

Now we have the input model, expected output model, and item processor with us. Now we need to configure the flow. 
So, I have created a configuration class that is BatchConfiguration and used @EnableBatchProcessing from the framework which will help to avoid a lot of manual works such as setting up a memory database. Also, I have autowired a couple of factories here which will be used in the latter part. with that, I have another bean to get the instance of the item processor (BookProcessor) we have created above. Now let’s introduce a few more beans to the configuration. 
reader() has been added to configure the reading data from the source. In this, I am building an instance of FlatItemReader and returning it. while building it I named it “bookItemreader”, added the source CSV from the resource path then adding all the columns in the dataset to be read. After that mapping it to a model to create the objects. 
writer() has been added to configure the way the data is going to be written into DB. So, here I am building the JdbcBatchItemWritter and returning it. One thing you may ask how I’m inserting in the query without actually creating the table. Well, that’s where JPA comes into the picture. I have used @entity annotation when I was creating the book class. So it will automatically create the table in-memory database while the application is starting. 

As a way of organizing everything that I’ve done so far. I am going to take the steps into the picture. These steps are basically instructions on how to build the entire pipe in Batch Processing. In this particular example, we just need only one step. 
In Step1() I have added the number of records to be taken as a chunk. This means when the batch processing is happening it will take a particular number of records(20 in this example) to a group and process parallelly. 

Now we need to prepare the logs to monitor the progress of batch processing. 
Here I am extending the JobExecutionListnerSupport which already has the implementation and overriding in a way that I want to log. Also, I am just printing in the logs in the console rather than preparing proper logging as this is just a demo application. 

Now let’s think about executing our configured steps. Let’s go back to BatchConfiguration.java and add the execution flow. 
importUserJob() has been created to build the flow of our steps. Now in terms of setup, we have completed everything. But one more important task you need to do. You have to check whether your dataset contains any labels in the file. If so, please remove that row.
As you see in the above screenshot my dataset has the label row which needs to be removed. Now let’s remove and run the application!
As we could see in the screenshot the dataset is successfully imported to the in-memory database👏. I hope this will be helpful for you to get started with Spring Batch. 

Thank you for reading! 

Follow me if you are interested in programming and engineering topics. Please share my blogs with relevant people if you like them.

@linkedIn : msvithushan

Comments

  1. That's a nice post and very descriptive blog and this was enough to clear my concept on spring-batch project since I have not worked on this yet.

    Just few suggestions and it is upto you to use them since they will do nothing but will make your code look better :-
    1.) You can use Lombok project to get rid of your setters and getters in your Book and BookInput model.
    2.) Instead of using old looking println call in your JobCompletionNotificationListener.java class foreach(book -> System.out.pruntln(book));
    You can use method reference
    foreach(System.out::println);

    Thanks
    Shubham Jaiswal

    ReplyDelete
  2. Can I get some example on chunks
    processing in a taskel

    ReplyDelete

Post a Comment

Popular posts from this blog

What inspired spaceX to perform a historical achievement

As we know on Saturday, May 30th at 3.22 p.m EDT SpaceX has successfully launched the crew dragon’s demo-2 with NASA astronauts Bob Behnken and Doug Hurley on top of Falcon-9 . For American soil, it’s a 9 years gap from the STS shuttle. Also, this is the first privately-owned space vehicle to bring humans to Low Earth Orbit. The biggest question is “How the SpaceX managed to do this?”. If you want to know the answer you have to know the journey of SpaceX from the beginning. In 2002, SpaceX was formed by Elon Musk and Tom Mueller (Present CTO of SpaceX) with the vision of colonizing humans in the Red Planet. In the early 2000s, SpaceX was a joke for the space exploration community. Everyone knows that transporting humans to Mars is a big deal. Still, the biggest achievement in mars is NASA’s Curiosity Mars Rover. Elon Musk was very clear about this at the time when SpaceX was founded. Also, his fortune was not enough to go for launch at that time. But they were ready to face the challen...

FaaS in Action with AWS + Serverless Framework for Java Developers

Serverless is becoming popular and Function as a Service is one of the trends in the serverless world. So being familiar with this technology is a really good addition to the profile as a developer. But from the very beginning of serverless, the community was preferring to use javaScript as the programming language. The reason for that is Serverless was initially made for less compute-intensive and more I/O intensive services where the JavaScript is performing well. So, that leads JavaScript to hold nearly 90% of the serverless world. But moving forward the way we use serverless and utilize computational power has changed. So there is nothing wrong for a Java developer to try out the FaaS and build comprehensive services. Therefore, I will be giving a practical example for developing an API in Java and deploying it in the AWS cloud. There are a few prerequisites to follow along Basic knowledge developing an API Basic idea about AW...