Home > A few words about... > An overview of the Business Intelligence world

An overview of the Business Intelligence world

Business Intelligence (BI) is a field of computer-science the goal of which is to build decision support systems.

According to Wikipedia, BI involves the means, the tools and the methods used to collect, consolidate and produce the data model, and to finally display it. It is designed:

  • to help a company taking decisions;
  • to cover its whole activity for people in charge of the strategy.

Simply put, BI is the art of transforming raw data into valuable information.

This article begins with an introduction to BI, then describes the main actors of the BI market, and what their products are designed for.

Business Intelligence

Setting up a BI solution means building a decision tool from internal and external data. The goal of BI is to provide the information required to take decisions. The main steps of any BI projects include:

  1. Collecting data from all the existing sources and putting it into an ODS. Data is simply extracted from RDBMS, ERP, CRM systems, flat files, structured files (XML, CSV, XLS…) and loaded into a DBMS.
  2. Consolidating data into the DWH and as many DM as needed (one or more by line of work, depending on the needs). Data is extracted from the ODS and loaded into the DWH and the DM after some transformations. One also uses OLAP Cubes for projects involving multi-dimensional data.
  3. Consuming data using front-ends such as web or mobile applications. These applications are built using specific tools and allow the user to display pre-computed dashboards or to explore the information stored in the DWH, the DM or the Cube.
  4. Analyzing data with data-mining tools.

Key steps of BI: collect, consolidate, consume, analyze.

BI uses three kinds of technologies (we won’t talk about data-mining):

  • data integration tools (also known as ETL or ELT) to collect and consolidate data;
  • databases to store data into the ODS, the DWH, the DM or the cubes;
  • reporting tools to create dashboard, scorecards and to let end-users explore the data.

Each technology may be used independently and for different purposes.

The following figure describes the data lifecycle from the sources to the end-user.

Data lifecycle: integration, storage, presentation to the end-user.

Data Integration

Data integration is performed using ETL or ELT tools.

An ETL extracts data from different sources, applies transformations (cleansing, renaming, computing) using its own engine (implemented in general-purpose programming languages, such as Java, C++…) and loads data into the target tables. Informatica PowerCenter, IBM Datastage, Microsoft SQL Server Integration Services, SAP BO Integration and Talend Enterprise Data Integration are ETLs.

Unlike in an ETL, the transformation process in an ELT is performed using the target DMBS engine (SQL language), like Oracle Data Integrator.

In both cases, data integration tools move data from many different sources to a single target that may contain many tables. To choose an ETL/ELT tool, a company may consider the cost of the tools, the possibility to load data from all of its sources’ kinds, the language used (Java, SQL, Perl…), as well as the IDE that comes with it.

Databases

Databases store the result of data integration. They need to be really efficient, especially for reading operations, because many end-users may use dashboards or data exploration at the same time and on a lot of data. In contrast, writing operations are often performed off the working hours.

The most popular databases are relational, like Microsoft SQL Server, IBM DB2, Oracle RDBMS, MySQL or PostgreSQL. The choice between them is done depending on their price (some are free, some other are very expensive) but also by looking at the administration tools, the IDE and the SQL extensions provided by the editor. Although RDBMS are used, data modeling is typically done according to a star schema, which does not fit into the normal form.

Example of a star schema.

Teradata Database and Greenplum are also RDBMS but they are specialized in big volumes and use massive parallel processing. For companies with big volumes, it’s almost mandatory to use such databases to keep good performances.

Finally, in order to optimize reading operations, some databases like SAP Sybase IQ are column-oriented (as opposed to RDBMS that are row-oriented).

Reporting

Last but not least, the reporting tools are very important because they are the only mean by which end-users will interact with the information. The reporting market is more complex because there is a large number of editors that are very different from each other. But at the end, the way they work is very similar:

  1. Developers must specify the way the tool accesses data: connection to the database, data modeling (using star schema, dimensions and fact tables).
  2. Then end-users explore data themselves or / and use pre-computed dashboards.
  3. The leaders in this market are (more because of acquisitions and the editors’ reputation than on the basis of some quality criteria):

  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: