Big Data Apex Center


We provided training to our final year CSE/IT students in “Cloud application development using Hadoop distributed file system (HDFS) and Map reduce”. The following are the details of mini project works currently being carried out in the “Big Data Research Center”.

1. Development of Prototype Web Search Engine

In this project, students have built the inverted index (consisting of keywords and URLs) by reading the HTML files stored in HDFS using Map Reduce programs. The index file is also stored in HDFS. The URLs are ordered according to their rank which is calculated based on number of in-links to those URLs. The students have developed the GUI using Servlet API and linked the same with MapReduce programs.

2. Tourism Information System

Students have developed search facility to locate the places of visit in a particular city. The details of city and places are stored in the HDFS files(s). Using MapReduce programs search is done in a parallel manner and the results are also stored in HDFS. The students have developed a good GUI by using Servlet API and linked the same with MapReduce programs. This project provides the following facilities to the users: (i) Searching for famous places in the city (b) Locating good restaurants in the city (c) Locating shopping malls (d) Transport facility available with route maps (e) Adding comments or reviews. We are planning to deploy this software in our PRIVATE CLOUD (which is being built using OPENSTACK) and make it available for the use of our students and staff.

3. Anti Plagiarism Detection System

Plagiarism is copying another author’s thoughts, expressions and ideas and representing them as one’s own original work. In this project, our students have developed MapReduce programs to identify the sentences which are copied in the given document (GD) by searching for the same in a number of already existing documents (EDs). Note that, both GD and EDs are stored in the HDFS. The students have developed GUI using Servlet API.

4. Market Analyzing System

The main aim of the project is to know where (various places in a city/ villages of a taluk/ cities or villages in taluk or district or state or country) the items like vegetables, fruits, rice, etc., will be available for less cost and to facilitate the customers to purchase those products for less price by also considering the transport cost. This project is developed by using HDFS and MapReduce. The GUI of the project is developed by using Servlet API.

5. Transport Information System

This system helps the visitors of a city to select a particular mode of transport for visiting various places located in that city in a cost effective manner. It also provides route map to reach a destination from a particular place in the city. This system also provides information regarding traffic in a particular road and suggests alternative path(s) to reach the destination. All transport details and feedback details are stored in HDFS. MapReduce programs are developed to select the transport and to give information regarding traffic. Route maps are obtained by linking with Google Maps. The GUI of the project is developed using Servlet API.

We are planning to deploy this software in our PRIVATE CLOUD (which is being built using OPENSTACK) and make it available for the use of our students and staff.

6. Advertisement Posting Based on Users Behavior

The main aim of the project is to post advertisements in a website (which facilitates the users to chat with each other through web) based on user behavior. The behavior of the user is known by storing and analyzing user details, user actions, user hobbies etc. To store the details of the users HDFS is used and for identifying similar interests MapReduce programs are used. Servlet API is used for developing GUI for the project.

7. Village Information System

The aim of this web-based system is to establish a direct connection between the producers (farmers) and the consumers for purchasing/selling crops, cattle, dairy products, handicrafts, handlooms, etc. In this project, to store the details of the farmer HDFS is used and to search for a particular commodity in a parallel manner MapReduce programs are developed. Servlet API is used for developing GUI for the project. The advantages of this system are (i) It provides the users complete information about the commodities and items available in a village. (ii) It gives the users a reasonable price for buying commodities/items which is less than the retail price and profitable to the users. (iii) The producer also stands profitable as he/she is free to quote his/her price and to get the value of his/her goods.

8. Chain of Stores

The main aim of this project is to locate a nearby store for a customer where he/she can purchase the given product for less cost by including the cost of transportation. This project provides total sales information of chain of stores and also it suggests where to start a new branch for the store. In this project, to store product details HDFS is used and to search for a particular product in a parallel manner MapReduce programs are developed. Servlet API is used for developing GUI for the project.

9. Government Hospital Management System

Government Hospital Management System is a web-based application designed for Govt. Hospitals to support hospital administration and management activities. A patient can choose a particular hospital and doctor by giving his details. Also, this system provides information regarding availability of medicines to doctors. To store details of patients, doctors, hospitals and medicines, HDFS is used. MapReduce programs are used to locate the hospital, to identify doctor and to check for availability medicines. Servlet API is used for developing GUI for the project.


The main aim of the project is to suggest measures to be considered for crop protection. This system suggests pesticides to be used for the crops based on symptoms provided by the users (farmers). For the given crop, this system can suggest what type of diseases can occur and what protection measures have to be considered for protecting the crop. In this project, to store crop names, symptoms related to various diseases of the crops, disease details of the crop and information regarding pesticides HDFS is used. MapReduce programs are developed to suggest protection measures for the crop. Servlet API is used for developing GUI for the project.

NOTE: The team led by Mr. TARUN TATIKONDA of IV IT (Project title: Transport Information System) has qualified to participate in the Phase II of the “Unisys Cloud 20/20” an all India-based cloud project competition.


1. A Novel Scheduling Algorithm in HADOOP

Data management is the big issue in today’s world, where data is the collection of vast information. Fastest retrieval of data and availability of data is also an important task. This can be resolved by using “REPLICATION” (Replication can be defined as the process of maintaining multiple copies of same data and sharing the data with multiple The replication strategy in Hadoop is to store three replicas, two in one rack and third one in other rack which is nearest to the previous rack with this strategy there is a 0.1% chance of losing two rack servers at a time so in order to resolve the issue we proposed a novel technique of data replication strategy to decrease the chance of losing data and also to increase the availability of data.

2. Novel Data Replication Strategy in HADOOP

In present scenario it is very important to use scheduling to manage the big data in any environment. Scheduling is the method by which threads, processors data flows are given access to system resources. The need for a scheduling algorithm arises from the requirement for most modern systems to perform multitasking and multiplexing. In this paper, we propose preemption and priority based scheduling in Hadoop. The proposed algorithm allows the scheduler to make more efficient decisions based on the priority of the jobs submitted to Hadoop system.

3. CHASU (Chat Hadoop Survey)

Smart phones are playing a key role amidst the customers using mobile chat to deliver higher service having an impact on social network sites, such as Facebook and Twitter. This projects mainly accounts on a great Mobile Chat Solution reviewing a rich, “desktop-like” chat experiences over a mobile device having compatibility with Hadoop and IBM Worklight as prerequisites. Also it features few new metabolisms covering friend, like and prediction system which have a great impact on the chat app improvising it from seldom ones.

4. Weather Damage Prophery suing in BIGDATA

In today’s world the prediction of weather is a challenging task. Weather prediction is the application of both science and technology in order to predict the weather for a particular location. Predictions include many parameters that are to be measured which are impossible with limited observations. The present predicting results are less accurate and the damages caused by it is not estimated because the time span between present moment and time for which forecasting is being made and released is varied. In order to find solution to these problems, we have come up with an algorithm using which would predict the scope of damage caused by the weather.

5. Multi Language Document Clustering using HDFS

As there are hundreds of languages, exists millions of multiple language documents in web. The functionality allowing anyone to find information that is expressed in their language is bit important aspect to be considered. In this project we present the selection of documents that may contain many languages. This multi language clustering refers to the capability of users to retrieve documents written in a language different from a query language. This requirements are classified by stating that in a multi language information access information is retrieved, searched and presented effectively, without constraints due to the different languages and scripts used in documents and their metadata.

6. Web User Inteface for HDFS

The HDFS Web UI is an HTML-based application used to configure and manage the HDFS from a remote client(web browser). By extending the HDFS Web UI, we can build a consistent, reliable, and extensible UI.

The HDFS Web UI framework supports a simplified interaction model that consists of all types of operations that performs on HDFS: file handling, Permissions, and Replication etc. To present these items, the UI provides several page types.

7. Supplanting HDFS with BSFS

Hadoop is a software framework based on the Map Reduce programming model. It depends on the Hadoop Distributed File System (HDFS) as its primary storage system. To improve the performance, the HDFS layer of Hadoop can be replaced with a new, concurrency-optimized data storage layer called BlobSeer file system. The aim of the project is to compare the performance of HDFS and BSFS. In the proposed system by using create, read, write, delete operations in the distributed environment, performance of HDFS and BSFS is being tested. To find out which file system gives best performance for large and small datasets and also throughput time of the file system.

8. Web Search Engine Using Hadoop MapReduce

Internet has become an extremely handy tool for releasing, disseminating and obtaining information. In the vast network of information, search engine technology has played the role of information navigation. A SEARCH ENGINE is a huge collection of everything in this world. In this project, we are going to develop a SEARCH ENGINE in an efficient way to search the required content using hadoop MapReduce. This project is to speed up the query and to find the exact documents semantically agreeable with the word provided by the user .This makes the user’s job easy in searching the required data very fast.

9. A New Strategy For Video Recommendation

The system recommends personalized sets of videos to users based on their activity on the site. We discuss some of the unique challenges that the system faces and how we address them. Recommendations will done based on the user behaviour and frequently visited videos by the users and suggesting the videos by considering recent history. In addition to this, we will also show the user requested video (which was once not available when it was requested) when it gets available.