Since the beginning of this decade, big data has evolved as a promising term to realize data intensive projects and applications, which seemed impossible years ago. Using new technologies and techniques massive amounts of differently structured data, taken from various sources, can be easily stored, processed and visualized. Especially enterprises are aware about its potentials at several stages, such as the improvement of knowledge generation, organizational agility, business process, and competitive performance.
However, for an enterprise, a big data project is a huge investment that could lead to a crucial change on various levels, especially in terms of the infrastructural part. Thus, introducing a big data project is a strategic decision that needs to be made based on solid understanding.
In the course of time, several architectural approaches were developed to provide a supportive function during the realization of this kind of projects. Often oriented from a number of successful implementations these generalized solutions function as a form of a template. The component-based composition offers open space for own implementation deliberations, in terms of technologies, techniques and tools. Often, however, there are huge differences in terms of individual compatibility properties. While entire platforms exist for individual tools, in which the individual elements can be linked and integrated without major complications, special adapters, APIs or other specific approaches have to be applied in other areas.
The main goal of this work aims to provide both, an overview as well as a comparison of currently existing reference architectures in this area. In addition to the investigation of the various components, similarities and differences, it is necessary to find out for which approach should be used for an intended use. Furthermore, it needs to be identified which requirements have to be fulfilled in beforehand, in order provide a better chance of successful application.
Working plan: