Thursday, 30 October 2014

SPRING BATCH

Spring Batch, is an open source framework for batch processing – execution of a series of jobs. Spring Batch provides classes and APIs to read/write resources, transaction management, job processing statistics, job restart and partitioning techniques to process high-volume of data.


Spring Batch is a framework for batch processing – execution of a series of jobs. In Spring Batch, A job consists of many steps and each step consists of a READ-PROCESS-WRITE task or single operation task (tasklet).
  1. For “READ-PROCESS-WRITE” process, it means “read” data from the resources (csv, xml or database), “process” it and “write” it to other resources (csv, xml and database). For example, a step may read data from a CSV file, process it and write it into the database. Spring Batch provides many made Classes to read/write CSV, XML and database.
  2. For “single” operation task (tasklet), it means doing single task only, like clean up the resources after or before a step is started or completed.
  3. And the steps can be chained together to run as a job.
1 Job = Many Steps.
1 Step = 1 READ-PROCESS-WRITE or 1 Tasklet.
Job = {Step 1 -> Step 2 -> Step 3} (Chained together)

Spring Batch Examples

Consider following batch jobs :
  1. Step 1 – Read CSV files from folder A, process, write it to folder B. “READ-PROCESS-WRITE”
  2. Step 2 – Read CSV files from folder B, process, write it to the database. “READ-PROCESS-WRITE”
  3. Step 3 – Delete the CSB files from folder B. “Tasklet”
  4. Step 4 – Read data from a database, process and generate statistic report in XML format, write it to folder C. “READ-PROCESS-WRITE”
  5. Step 5 – Read the report and send it to manager email. “Tasklet”
In Spring Batch, we can declare like the following :


  <job id="abcJob" xmlns="http://www.springframework.org/schema/batch">
 <step id="step1" next="step2">
   <tasklet>
  <chunk reader="cvsItemReader" writer="cvsItemWriter"  
                    processor="itemProcesser" commit-interval="1" />
   </tasklet>
 </step>
 <step id="step2" next="step3">
   <tasklet>
  <chunk reader="cvsItemReader" writer="databaseItemWriter"  
                    processor="itemProcesser" commit-interval="1" />
   </tasklet>
 </step>
 <step id="step3" next="step4">
   <tasklet ref="fileDeletingTasklet" />
 </step>
 <step id="step4" next="step5">
   <tasklet>
  <chunk reader="databaseItemReader" writer="xmlItemWriter"  
                    processor="itemProcesser" commit-interval="1" />
   </tasklet>
 </step>
 <step id="step5">
  <tasklet ref="sendingEmailTasklet" />
 </step>
  </job>
The entire jobs and steps execution are stored in database, which make the failed step is able to restart at where it was failed, no need start over the entire job.

No comments:

Post a Comment