In the world of software development, Batch processing has been one of the challenging areas to implement in the early stages. But these days there are plenty of solutions available out of the box in the frameworks and platforms to do batch processing. In this article, I will share my experience with one such tool, Spring Batch. This will enable you to do batch processing by configuring this with your existing Spring Boot applications.
Before we jump into the Spring Batch let me brief you about batch processing. the name might make you fear a lot about it in case if you are new to this. It’s nothing but processing the transactions (data) as small chunks or groups without any manual interaction from the user of the application. You may ask why you want to do that. The answer is because it has a lot of benefits in terms of performance and the efficiency of the application as you deal with a large dataset.
Now let’s understand how it’s going to work and how we are going to implement this. First of all, I am going to pick an example to start with. I’ll take a large CSV dataset from Kaggle and write the data to the in-memory database of a Spring Boot application.

Here we there will be a user task that has the steps configured already to do execute the steps. In this particular use case, we will be only using one step. and it will have three main components those are reader, processor, and writer. I believe these components are self-descriptive enough with those names. But we will explore each in detail.
Now let’s try to implement this simple use case together. I’ll be attaching few code snippets and explaining step by step, but feel free to clone the GitHub repository and play with it from the following URL: https://github.com/vithushanms/spring_batch.git
The dataset being used in this example can be downloaded from here: https://github.com/vithushanms/spring_batch/blob/main/demo-batch-processing/src/main/resources/books.csv
As a first step, you should create the spring boot application with the following dependencies.
- Spring Batch I/O
- Spring Data
- JPA
- HSQLDB
After you have created the project, move the CSV which you have downloaded into the resource folder.

Now you have to create the data models for the Input data and the data you are going to write to the database.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
public class BookInput { | |
private String bookID; | |
private String title; | |
private String authors; | |
private String average_rating; | |
private String isbn; | |
private String isbn13; | |
private String language_code; | |
private String num_pages; | |
private String ratings_count; | |
private String text_reviews_count; | |
private String publication_date; | |
private String publisher; | |
public String getBookID() { | |
return bookID; | |
} | |
public void setBookID(String bookID) { | |
this.bookID = bookID; | |
} | |
public String getTitle() { | |
return title; | |
} | |
public void setTitle(String title) { | |
this.title = title; | |
} | |
public String getAuthors() { | |
return authors; | |
} | |
public void setAuthors(String authors) { | |
this.authors = authors; | |
} | |
public String getAverage_rating() { | |
return average_rating; | |
} | |
public void setAverage_rating(String average_rating) { | |
this.average_rating = average_rating; | |
} | |
public String getIsbn() { | |
return isbn; | |
} | |
public void setIsbn(String isbn) { | |
this.isbn = isbn; | |
} | |
public String getIsbn13() { | |
return isbn13; | |
} | |
public void setIsbn13(String isbn13) { | |
this.isbn13 = isbn13; | |
} | |
public String getLanguage_code() { | |
return language_code; | |
} | |
public void setLanguage_code(String language_code) { | |
this.language_code = language_code; | |
} | |
public String getNum_pages() { | |
return num_pages; | |
} | |
public void setNum_pages(String num_pages) { | |
this.num_pages = num_pages; | |
} | |
public String getRatings_count() { | |
return ratings_count; | |
} | |
public void setRatings_count(String ratings_count) { | |
this.ratings_count = ratings_count; | |
} | |
public String getText_reviews_count() { | |
return text_reviews_count; | |
} | |
public void setText_reviews_count(String text_reviews_count) { | |
this.text_reviews_count = text_reviews_count; | |
} | |
public String getPublication_date() { | |
return publication_date; | |
} | |
public void setPublication_date(String publication_date) { | |
this.publication_date = publication_date; | |
} | |
public String getPublisher() { | |
return publisher; | |
} | |
public void setPublisher(String publisher) { | |
this.publisher = publisher; | |
} | |
} |
So the above BookInput model describes the exact CSV data as to how it is represented. And the below one explains how we are going to write it to the database
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package blog.vithushan.demobatchprocessing.Model; | |
import javax.persistence.Entity; | |
import javax.persistence.Id; | |
@Entity | |
public class Book { | |
@Id | |
private long bookID; | |
private String title; | |
private String authors; | |
private String isbn; | |
public String getBookID() { | |
return bookID; | |
} | |
public void setBookID(String bookID) { | |
this.bookID = bookID; | |
} | |
public String getTitle() { | |
return title; | |
} | |
public void setTitle(String title) { | |
this.title = title; | |
} | |
public String getAuthors() { | |
return authors; | |
} | |
public void setAuthors(String authors) { | |
this.authors = authors; | |
} | |
public String getIsbn() { | |
return isbn; | |
} | |
public void setIsbn(String isbn) { | |
this.isbn = isbn; | |
} | |
} |
Also, I’ve used few annotations from JPA which are @Entity and @Id. These are used to automatically create the table and set the id when we run the spring boot application. Now let’s create the actual item processor which describes how one single BookInput will be transformed into a Book.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package blog.vithushan.demobatchprocessing.Data; | |
import org.springframework.batch.item.ItemProcessor; | |
import blog.vithushan.demobatchprocessing.Model.Book; | |
public class BookProcessor implements ItemProcessor <BookInput,Book> { | |
@Override | |
public Book process(BookInput bookInput) throws Exception { | |
Book book = new Book(); | |
book.setBookID(Long.parseLong(bookInput.getBookID())); | |
book.setTitle(bookInput.getTitle()); | |
book.setAuthors(bookInput.getAuthors()); | |
book.setIsbn(bookInput.getIsbn()); | |
return book; | |
} | |
} |
Here the above class BookProcessor implements the ItemProcessor where the process is already defined. So we are overriding it to transform the Data in our pipe. In simple words, this gives the definition for getting one single input record and converting it to one single book.
Now we have the input model, expected output model, and item processor with us. Now we need to configure the flow.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package blog.vithushan.demobatchprocessing.Data; | |
import org.springframework.batch.core.configuration.annotation.EnableBatchProcessing; | |
import org.springframework.batch.core.configuration.annotation.JobBuilderFactory; | |
import org.springframework.batch.core.configuration.annotation.StepBuilderFactory; | |
import org.springframework.beans.factory.annotation.Autowired; | |
import org.springframework.context.annotation.Bean; | |
import org.springframework.context.annotation.Configuration; | |
@Configuration | |
@EnableBatchProcessing | |
public class BatchConfiguration { | |
@Autowired | |
public JobBuilderFactory jobBuilderFactory; | |
@Autowired | |
public StepBuilderFactory stepBuilderFactory; | |
@Bean | |
public BookProcessor getProcessor(){ | |
return new BookProcessor(); | |
} | |
} |
So, I have created a configuration class that is BatchConfiguration and used @EnableBatchProcessing from the framework which will help to avoid a lot of manual works such as setting up a memory database. Also, I have autowired a couple of factories here which will be used in the latter part. with that, I have another bean to get the instance of the item processor (BookProcessor) we have created above. Now let’s introduce a few more beans to the configuration.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@Bean | |
public FlatFileItemReader<BookInput> reader() { | |
String[] fieldNames = { "bookID", "title", "authors", "average_rating", "isbn", "isbn13", "language_code", | |
"num_pages", "ratings_count", "text_reviews_count", "publication_date", "publisher" }; | |
return new FlatFileItemReaderBuilder<BookInput>().name("bookItemreader") | |
.resource(new ClassPathResource("./books.csv")).delimited().names(fieldNames) | |
.fieldSetMapper(new BeanWrapperFieldSetMapper<BookInput>() { | |
{ | |
setTargetType(BookInput.class); | |
} | |
}).build(); | |
} |
reader() has been added to configure the reading data from the source. In this, I am building an instance of FlatItemReader and returning it. while building it I named it “bookItemreader”, added the source CSV from the resource path then adding all the columns in the dataset to be read. After that mapping it to a model to create the objects.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@Bean | |
public JdbcBatchItemWriter<Book> writer(DataSource dataSource) { | |
return new JdbcBatchItemWriterBuilder<Book>() | |
.itemSqlParameterSourceProvider(new BeanPropertyItemSqlParameterSourceProvider<>()) | |
.sql("INSERT INTO book (bookID,title,authors,isbn) VALUES (:bookID,:title,:authors,:isbn)") | |
.dataSource(dataSource) | |
.build(); | |
} |
writer() has been added to configure the way the data is going to be written into DB. So, here I am building the JdbcBatchItemWritter and returning it. One thing you may ask how I’m inserting in the query without actually creating the table. Well, that’s where JPA comes into the picture. I have used @entity annotation when I was creating the book class. So it will automatically create the table in-memory database while the application is starting.
As a way of organizing everything that I’ve done so far. I am going to take the steps into the picture. These steps are basically instructions on how to build the entire pipe in Batch Processing. In this particular example, we just need only one step.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@Bean | |
public Step step1(JdbcBatchItemWriter<Book> writer){ | |
return stepBuilderFactory.get("step1") | |
.<BookInput,Book> chunk(20) | |
.reader(reader()) | |
.processor(getProcessor()) | |
.writer(writer) | |
.build(); | |
} |
In Step1() I have added the number of records to be taken as a chunk. This means when the batch processing is happening it will take a particular number of records(20 in this example) to a group and process parallelly.
Now we need to prepare the logs to monitor the progress of batch processing.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package blog.vithushan.demobatchprocessing.Data; | |
import org.springframework.batch.core.BatchStatus; | |
import org.springframework.batch.core.JobExecution; | |
import org.springframework.batch.core.listener.JobExecutionListenerSupport; | |
import org.springframework.beans.factory.annotation.Autowired; | |
import org.springframework.jdbc.core.JdbcTemplate; | |
import org.springframework.stereotype.Component; | |
@Component | |
public class JobCompletionNotificationListener extends JobExecutionListenerSupport{ | |
private final JdbcTemplate jdbcTemplate; | |
@Autowired | |
public JobCompletionNotificationListener(JdbcTemplate jdbcTemplate){ | |
this.jdbcTemplate = jdbcTemplate; | |
} | |
@Override | |
public void afterJob(JobExecution jobExecution){ | |
if(jobExecution.getStatus() == BatchStatus.COMPLETED){ | |
System.out.println("Batch processing completed!!"); | |
jdbcTemplate.query("SELECT bookID, title, authors FROM book", | |
(rs, row) -> "Book ID: " + rs.getString("bookID") + " Title: " + rs.getString("title") + " Authors: " + rs.getString("authors") | |
).forEach(book -> System.out.println(book)); | |
} | |
} | |
} |
Here I am extending the JobExecutionListnerSupport which already has the implementation and overriding in a way that I want to log. Also, I am just printing in the logs in the console rather than preparing proper logging as this is just a demo application.
Now let’s think about executing our configured steps. Let’s go back to BatchConfiguration.java and add the execution flow.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
@Bean | |
public Job importUserJob(JobCompletionNotificationListener listener, Step step1) { | |
return jobBuilderFactory.get("importUserJob") | |
.incrementer(new RunIdIncrementer()) | |
.listener(listener) | |
.flow(step1) | |
.end() | |
.build(); | |
} |
importUserJob() has been created to build the flow of our steps. Now in terms of setup, we have completed everything. But one more important task you need to do. You have to check whether your dataset contains any labels in the file. If so, please remove that row.
As you see in the above screenshot my dataset has the label row which needs to be removed. Now let’s remove and run the application!
As we could see in the screenshot the dataset is successfully imported to the in-memory database👏. I hope this will be helpful for you to get started with Spring Batch.
Thank you for reading!
Follow me if you are interested in programming and engineering topics. Please share my blogs with relevant people if you like them.
@linkedIn : msvithushan
That's a nice post and very descriptive blog and this was enough to clear my concept on spring-batch project since I have not worked on this yet.
ReplyDeleteJust few suggestions and it is upto you to use them since they will do nothing but will make your code look better :-
1.) You can use Lombok project to get rid of your setters and getters in your Book and BookInput model.
2.) Instead of using old looking println call in your JobCompletionNotificationListener.java class foreach(book -> System.out.pruntln(book));
You can use method reference
foreach(System.out::println);
Thanks
Shubham Jaiswal
Thanks Shubham
DeleteCan I get some example on chunks
ReplyDeleteprocessing in a taskel