One Continuous and One Categorical Variable

So Far in our earlier blogs, We have discussed the Analysis of Single Continuous Variable, Analysis of Single Categorical Variable, Analysis of Two Continous Variables, and Analysis of Two Categorical Variables. In this blog, we will learn the Analysis of Two Variables(One Continous, One Categorical).

Analysis of Two Variables | One Categorical and Other Continuous

Analysis of Two Variables | One Continuous and Other Categorical

Tabular Method

Formulate Table by aggregating the Continuous Variable (i.e., Like Sum, Count, Mean) with its corresponding category in the categorical Variables.

Graphical Method

Box Plot

The most common Descriptive Methods to analyze two variables(One continuous, One Categorical) are in the above table. Let's take one continuous Variable and one categorical Variable From 'Our MBA Students' and Analyze them.

Importing MBA Students in R

First, Let's Import MBA Students Data in R. The R programming Code to Import 'MBA Students Data' is given in the table below:

                          #Set directory as per your folder file path              setwd("D:/k2analytics/datafile") getwd()              #Read the File              mba_df              =              read.csv("MBA_Students_Data.csv", header              =                              TRUE)          

12th Standard Stream Vs Working Experience in Months.

Variable

12 th  Standard Stream

Work Experience in Months

Variable Name

ten_plus_2_stream

work_exp_in_mths

Description

This Variable describes the 12th Standard Stream of the Students. Like Science or Commerce.

This Variable describes the working experience of Students in months.

Variable Type

Categorical

Continuous

Before Analyzing two Variables, Analyze both the Variables Individually. We will Leave This as a Practise as we already discussed the analysis of single continuous and single Categorical variables in our previous blogs.

Data Preparation

The Work Experience in Months Variable Contains NA. Let's replace them with 0. Let's Assume they have no prior working experience.

                          #Data Preparation              mba_df$work_exp_in_mths[is.na(mba_df$work_exp_in_mths)]              =              0                      

Recategorize all the Students' 12th Standard Stream into two categories. i.e., Science and Commerce.

                          #Recategorizing              ten_plus_2_stream_recat              =              function(x){   x              =              toupper(x)              if              (grepl("COMMERCE",x)){              return              ("COMMERCE")   }              else{              return              ("SCIENCE")   } }              #Recategorizing ten_plus_2_Stream              mba_df$ten_plus_2_stream_recat              =              lapply(mba_df$ten_plus_2_stream, ten_plus_2_stream_recat)              # Converting List to Vector              mba_df$ten_plus_2_stream_recat              =              as.vector(unlist(mba_df$ten_plus_2_stream_recat))                      

Tabular Report

The Easiest Way to Analyze the Categorical and Continuous Variables is to create a Tabular Report. 'R code' to create a Tabular Report is given in the below table:

                          #Aggregating              aggr              =              aggregate(mba_df$work_exp_in_mths,by=list(mba_df$ten_plus_2_stream_recat),           FUN=function(x) c(count              = round(length(x)),                             sum              =              round(sum(x)),                             mean              =              round(mean(x),1)))              #Renaming Columns                            colnames(aggr)              =              c("stream","work_exp")                        print(aggr)          

#Output / Tabular Report

                          stream work_exp.count work_exp.sum work_exp.mean              1              COMMERCE              126.0              1250.0              9.9              2              SCIENCE              74.0              1237.0              16.7                      

Interpretation IIn the above output table, We have taken two Important measures, Sum and Mean.

  • Based on the sum of the Working Experience of the students. When Combined, Commerce Student has More Work Experience than Science Students.
  • Based on the mean of the Working Experience of the students. When combined, Science Student has More Work Experience than commerce Students.
  • In this Scenario, The Mean makes more sense than the Sum. Hence, This is Important for the Data Analyst to choose the best Aggregation Measure.

Graphical Methods | Boxplot

Boxplot quickly shows the distribution of the data in the variable. It also helps to find outliers. Boxplot is one of the most common methods to visualize the continuous variables by its corresponding category. The 'R code' to Create a box plot is given below:

            boxplot(mba_df$work_exp_in_mths~mba_df$ten_plus_2_stream_recat,         xlab              =              "12th Standard Stream",         ylab              =              "Work Experience in Months",         main              =              "Box Plot \n                                            12th Standard Stream Vs Work Experience                              ",              col              =              c("royalblue","gold"))                      

Boxplot | 12th Standard Stream Vs Working Experience

Interpretation

Form the Above Box Plot We can Interpret,

  1. The Average prior Working experience of Science Students(17 months) is getter than Commerce Students(10 months).
  2. The third quartile(Q3) of the working experience of Commerce students is very close to the median(Q2) of the working experience of Science students.
  3. The boxplot shows There are outliers in the working experience of the Commerce Students. Since it is the Working Experience of the Students it cannot be considered as an Outlier. i.e., Few Commerce Students have relatively more Working Experience than Othe Commerce Students.

NOTE: Density Plot is also used to Visualize the continuous variables by its corresponding category. But Boxplot is more Informative than Density Plot. We leave this as a practice for our blog readers.

Practise Exercise

  • Analyze the MBA Specialization with the MBA Grades.
  • Analyze the MBA Specialization with the Graduation Percentages.

Upcoming Blog

In the upcoming blog, we will learn"Analysis of Time and Continuous Variables"

ibarraarks1973.blogspot.com

Source: https://www.k2analytics.co.in/analysis-of-two-variables-one-categorical-and-other-continuous/

0 Response to "One Continuous and One Categorical Variable"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel