Contents
Excel is an effective tool for statistical data processing. And the definition of correlations is a very important part of this process. The program has all the necessary tools for carrying out calculations of such a plan. Today we will take a closer look at what we need to carry out this type of analysis.
What is correlation analysis
In simple words, correlation is the relationship between two phenomena. In turn, correlation analysis means the identification of this relationship. A very common statement says that correlation is a relationship between different objects, but in reality this is an imprecise definition. After all, there are many images that show the connection between phenomena that cannot be dependent on each other or one third factor that affects them.
To determine the dependence, another type of analysis is used, which is called regression.
The value that determines the severity of the relationship is called the correlation coefficient. This is the only value that is calculated by correlation analysis compared to regression. Possible variations of the correlation coefficient can be in the range from -1 to 1. If this number is positive, the relationship between the dynamics of changes in values is direct. If it is negative, then an increase in the number 1 leads to a similar decrease in the number 2. If the number is less than one in absolute value, then the correlation is incomplete. For example, increasing the number 1 by one leads to an increase in the number 2 by 0,5. In this case, the correlation coefficient is 0,5. If the correlation coefficient is 0, then there is no relationship between the two variables.
An interesting fact: correlations are divided into true and false. That is, sometimes the fact that the graphs go in the same direction can be pure coincidence, and not a natural consequence of the impact of one variable on another or the influence of a common factor on both variables. In narrow circles, pictures are quite popular, where absolutely unrelated phenomena correlate with each other. Here are some examples:
- The number of people who have drowned in swimming pools is clearly correlated with the number of films in which Nicolas Cage was an actor.
- The amount of mozzarella eaten and the number of people who received PhDs also correlated over the years 2000-2009. Probably, indeed, mozzarella somehow affects the brain and stimulates the desire to make scientific discoveries.
- In almost all cases, the average age of women who have received the title of “Miss America” correlates with the number of people who died from being in a hot couple.
- The number of people who died as a result of a traffic accident clearly correlates with the amount of sour cream that people eat.
- Few people know that the more chicken a person eats, the more crude oil is imported in the world. True, this is also an example of a false correlation. By the way, the import of crude oil from Norway is closely related to the number of people who died in a car-train collision. And in this case, the correlation is almost 100 percent.
- And margarine has a negative effect on divorce statistics. The more people who lived in Maine consumed margarine, the higher the divorce rate. True, there may still be a rational grain. After all, the frequency of consumption of margarine has an inverse correlation with the economic situation in the family. In turn, the poor economic situation in the family is directly related to the number of divorces. And it has been scientifically proven. So who knows, maybe this correlation is not so false. True, no one has verified this.
- The amount of money the US government spends on science, space, and technology is closely related to the number of suicides carried out by hanging or strangulation.
And finally, another example of a false correlation – the more cheese people eat, the more people die because they get tangled in their sheets.
Therefore, despite the fact that correlation is an effective statistical tool, one must learn to filter out true relationships between phenomena and false ones. Otherwise, the study can get such interesting results. Now let’s move on to how to conduct correlation analysis in Excel.
Correlation analysis in Excel – 2 ways
The correlation coefficient is calculated in two ways. The first is using the Function Wizard, which allows you to enter a formula CORREL. The second tool is an analysis package that requires separate activation.
How to Calculate the Correlation Coefficient
Let’s demonstrate the mechanism for obtaining the correlation coefficient on a real case. Let’s say we have a table with information about the amount of sales and advertising. We need to understand to what extent the number of sales and the amount of money that was used for promotion are related.
Method 1. Determining the correlation using the Function Wizard
Function CORREL – one of the simplest methods of how to implement the task. In its general form, this operator has the following form: CORREL(massiv1;massiv2). How to enter it? To do this, you need to perform the following actions:
- Using the left mouse button, select the cell in which the resulting correlation coefficient will be located. After that, we find the fx button to the left of the formula bar, which will open the function input tool.
- Next, select the category “Full alphabetical list”, in which we are looking for a function CORREL. As you can see from the category name, all function names are listed in alphabetical order.
- Next, a window for entering function parameters opens. We have two main arguments, each of which is an array of data that is compared with each other. In the “Array 1” field, specify the coordinates of the first range, and in the “Array 2” field, the address of the second range. To enter the data of the array used for the calculation, it is enough to highlight the left mouse button on the corresponding field and select the correct range.
- After we enter the data in the arguments, we press the “OK” button, which confirms the actions taken.
After completing the steps described above, we see in the cell that we selected in the first step, the correlation coefficient. In our example, it is 0,97, which indicates a very strong relationship between the data of the two ranges.
Method 2: Calculate the correlation using the analysis package
Also a pretty good tool for determining the correlation between two ranges is the analysis package. But before we can use it, we need to enable it. To do this, perform the following steps:
- Click on the “File” button, which is located in the upper left corner immediately next to the “Home” tab.
- After that, open the settings section.
- In the menu on the left, go to the penultimate item, entitled as “Add-ons”. We make a left click on the corresponding inscription.
- The Manage Add-ons window opens. We need to switch the input field at the bottom to “Excel Add-ins” and click on “Go”. If this field is already in this position, then do not make any changes.
- Then enable the analysis package in the settings. To do this, put the appropriate checkmark and click on the “OK” button.
That’s it, now our add-on is enabled. Now we can see the “Data Analysis” button in the “Data” tab. If she appeared, then we did everything right. We click on it.
A list appears with a choice of different ways to analyze the information. We should select the “Correlation” item and click on “OK”.
Then we need to enter the settings. The main difference between this method and the previous one is that we need to enter the entire range, and not break it into two parts. In our case, this is the information indicated in the two columns “Advertising Costs” and “Sales Value”.
We do not make any changes to the Grouping parameter. By default, the item “By columns” is set, and it is correct. This setting determines how the program will split the data. If our data were presented in two rows, then this item would have to be changed to “By rows”.
In the output settings, the item “New worksheet” is already set. That is, information about the correlation will be located on a separate sheet. The user can set the place on his own using the appropriate switch – to the current sheet or to a separate file. Check if all settings have been entered correctly. If yes, confirm your actions by pressing the “OK” button.
Since we left the field with data about where the results will be displayed the way it was, we are moving to a new sheet. You can find the correlation coefficient on it. Of course, it is the same as it was in the previous method – 0,97. The reason for this is that the calculations are the same, we also did not change the initial data. Just different methods, but no more.
Thus, Excel immediately gives two methods for the implementation of correlation analysis. As you already understand, as a result of calculations, the result will be the same. But each user can choose the calculation method that suits him best.
How to build a correlation field in Excel
So let’s now figure out how to build a correlation field. First you need to figure out what it actually is. Correlation field actually means the correlation graph. The main requirement for such a diagram is that each point must correspond to a unit of the population. The correlation field will help to establish deeper connections and analyze the data more qualitatively. First we need to find the correlation coefficient between the two ranges using the function CORREL.
Once we’ve done that, we can now make a correlation box. To do this, perform the following steps:
- Go to the “Insert” tab and there we find the “scatter plot” version of the chart.
- After we have added it, we right-click on the future correlation field and call the context menu. Next, click on “Select Data”.
- Next, we select our range as the data source. After that, we confirm our actions by pressing the OK button. The program will perform all other actions on its own.
This graph can be built not only on the basis of the correlation determined through the function CORREL.
Scatter diagram. Correlation field
Until now, some users are sitting on the old version of Word. How to build a correlation field in this case? To do this, there is a special tool called the Chart Wizard. You can find it on the toolbar by a specific diagram image. If you hover over this icon with the mouse, a tooltip will appear to help us verify that this is indeed a chart wizard.
After that, a dialog box will appear in which we need to select the scatter chart type. We see that the logic of actions in older versions of the office suite generally remains the same, just a slightly different interface. A little to the right, we can see what the scatter plot will look like and choose the appropriate view, as well as read the description of this type of chart. After that, click on the “Next” button.
Then we select the data range and our line appears. After that, you can add the regression line to the graph. To do this, you need to right-click on one of the points and in the list that appears, find “Add a trend line” and click on this item.
Next, set the settings. We are interested in the “Linear” type, and in the parameters window you need to check the “Show equation in the diagram” box.
After confirming the actions, we will have something like this graph.
As you can see, there can be a huge number of possible construction options.