Apache Storm is a real-time data processing software, It can process through data to find a particular trend or similar words in the queries.
Storm allows developers to build powerful applications that are highly responsive and can find trends between topics on twitter, monitoring spikes in payment failures, and so on.
Apache Storm is a free and open source distributed real time computation system.
Data which is flowing into your system continuously is called streaming data.
For example, say every uber cab traveling out on street is sending it's location information back to uber servers. This location information of each car is further used to serve nearest cab requests by uber cab users.
Here, location information continuously flowing to central servers is a continuous stream of data records containing location information. This is streaming data.
User clicks on a web page getting continuously collected at servers is also streaming data.
Storm process stream of data as it arrives. As soon as log record arrives, required processing is done on that log record and it is marked as done. Apache storm is made for serving such real time streaming data processing requirements.
Apache Storm is continuing to be a leader in real-time data analytics. Storm is easy to setup, operate and it guarantees that every message will be processed through the topology at least once.
Batch processing is where the processing happens for blocks of data that have already been stored over a period of time.
For Example, Let's consider Twitter (or something like that) at the time of Oscars. At the time of Oscars the #Oscars will be leading tag in twitter but after sometime some other tag becomes the leading tag like #Moonlight or #12YearsASlave.
In case of batch processing, the data base or the data storage system will collect all the tweets for 10 minutes (for a specific time). Once the data is collected it will initiate the processing of the tweets.
Lets consider the the processing takes 2 Minutes, now the Twitter can change the leading tag once in 12 Minutes (100 minutes of collecting + 2 Mintes of processing).
The above process is knows as batch processing because it collects the tweets/data for given period of time then only the processing occurs. So always the process happens to a batch of data at given intervals
The Batch processing tha data provides periodic updates on the trend.
Consider above example with Oscars and Twitter, In the streamg process when ever there is data is is received the prpcess will be initiated on the data but for first specific period, it might need some data to prcess from that point it will change the trends based on every tweet.
In case of Streamng process, we will wait for little period of time like 10 minutes(till we get enough data) to calculate the first set of trend, but once we have enough data to process then we will process it.
From that moment onwards we will have enough data to prcoess the tweets, so here we will no need to wait for any periodic updates.
The streaming data provides continuous updates on the trend.
In general, stream processing is useful in use cases where we can detect a problem and we have a reasonable response to improve the outcome. Also, it plays a key role in a data-driven organization.