Friday, June 12, 2015

Spring Batch Technical Concepts

The technical aspects of a Spring Batch comprises of the Language and libraries used. Libraries consists of JobLauncher, Job, Step, JobRepository, ItemReader, ItemProcessor, ItemWriter.

A Job is launched by a JobLauncher. A Job consists of multiple steps. Each step has a Reader/Processor/Writer. Metadata about the process is stored on a JobRepository.




Job

Spring Job represents entire batch process. A job contains steps which are executed within a batch. It stands at the top of the table in the spring batch processing. A wrapper/container for steps to be executed one by one or simultaneously. 

Job can be defined in an xml file listing the steps to be executed. Job has a Name, Steps, Restartable (Yes/No). execute(...) method executes the Job.

Job Instance

Spring JobInstance represents one iteration of a Job. It's lifecycle consists of multiple failures and ends with one successful run. A successful Job can't be restarted. A failed Job may be restarted depending on other factors.                                        
Consider an example of a Cricket match. A match starts (JobInstance). Runs are scored, players get injured. And then it rains. Players go off the field (Job paused). Now either the rain subsides and same JonInstance starts again. If it rains a lot then No-Result (JobFailure). In this case we can start a match again next day. Either we continue from the score we left off (Start off the Failed Instance). or we start the match again (Start a new JobInstance).

Only one match (JobInstance) between two given teams should be played a given time.
getJobName() give the name of the Job.
getInstanceId() will give the Id for a given JobInstance.


JobParameters

JobParameters  are associated with a given JobInstance. They are the run time parameters passed to a Job. They identify an Job Instance uniquely. They hold the reference data required to run a Job.
In the above cricket match example. A match that starts today starts on day sat 'T'. If it restarts the next day, then it starts on day 'T+1'. 

These Parameters need to be persisted to be able to referenced again for restart etc. 
JobParameters is immutable/thread-safe class. It overrides hascode and equals method to be able to compare between two JobParameters.

JobExecution

JobExecutions contains what exactly happened during a run. For a give JobInstance, if it is run multiple times (with same JobParameters), a unique JobExecution takes placed. Thus JobInstance and JobExecution has one-many relationship.

JobExecution contains information such as StartTime, EndTime, Status, JobInstanceId etc. JobExecution can be stopped by stop().

A typical scenario looks like below,
S No.
JobExecutionId
JobInstanceId
StartTime
EndTime
Status
1
101
12
X
X+1
FAILED
2
102
12
X+2
X+3
COMPLETED
3
103
12
X+4
X+5
COMPLETED

Step

Spring Batch Step contains the configuration of a Step in Batch Processing. When a complete Job is to play a match. First Step can be Toss, 2nd Step can be the Powerplay overs and so on.
Job vs Step
Just like a Job, execute() starts a Step. You can consider it just a baby-step. In an application, first step can be to read the data from a database. Complex steps can be to perform computations on the data and process it. Simple ItemReader/ItemWriter/ItemProcessor to application specific functions are preformed in a Step.

StepExecution

StepExecution is just like JobExecution, but for a Step. Every time a Step is run, a new StepExecution is created. However it contains more details than a JobExecution as proceeding steps depend on this step (unlike JobExecution, when it comprises of entire Job). So it contains transactional data (commit/rollback etc).

ExecutionContext

ExecutionContext consists of the data which is persisted and is required by StepExecution/JobExecution for example, information required to restart.

In case of a failure, we need the state of the last step (or some preceding step). This is stored in format. Both key and values are user defined.

We can store the result of a particular step. For example as below


Consider Execution Context like a generic HashMap. However we have to be careful how much we persist, as it can bubble up Program Memory.


JobRepository

JobRepository is for persisting everything we discussed till now. At the starting, JobExecution is read from JobRepository and intermittently Step/JobExecutions are persisted to JobRepository. JobRepo is omnipresent across the lifecycle of Batch Execution. What is persistence infra for JobRepo depends on implementation. It can be a flat-file/database etc.

ItemReader/ItemWriter/ItemProcessor

Abstractions provided by Spring batch for reading/writing/processing data required for a Step/output from a Step. 

ItemReader has many implementations in Spring Batch to read from Files/Database/Queues etc. Need not be thread-safe as it just reads without modifying data.

ItemWriter has many implementations in Spring Batch to write to Files/Database/Queues etc. Implementing class needs to take care of serialization of the Objects written. 

ItemProcessor has many implementations in Spring Batch as per requirement.

It is possible to write custom reader/writer/processor as we will see later.


No comments:

Post a Comment