We have application where we have to react to the events immediately without delay, to suit such application we use the stream processing.
Lets consider a application which generates the data at one hour interval and the stream processing time is 10 Minutes, Now every output will have a distance of 10 Minutes (if there is enough data).
No issue occur and there will be no data in bucket for the processing unit beacuse it process input data before even another input comes.
Stream processing Time < Data Arrival Time
Let's assume another scenario, where the Input data come at the interval of 10 minutes and the time taken for the processsing is 1 Hour.
Now the processed output once at every hour. In this case there will be data in the bucket for the processing unit to process. As the time goes all the data will be accumulated at entry point beacuse of the processing time.
Stream processing Time > Data Arrival Time
In legacy system, the occumulation is more and it become useless beacuse we need to process the data as it happens. So this is where the solution comes into pictures, and it is, instead of having one processing unit at the Stream processing if have more than one unit then we can distribute the load across the units/systems.
When we have distributed system, the data will be routed to the free unit rather waiting for the Unit to become free.
Apache Storm brings the distributed system to the Stream Processing
Nope, hadoop is a distributed system for batch processing, this the major difference between the Haddop and the Apache Storm.
Haddop will perfom the processsing at periodic intervals but Apache storm will process the data as the data comes into the system.
Between these two there is one more tool called