Learn what logistic regression is, what types of logistic regression you can perform, and how it differs from linear regression.
Logistical regression is an asset to any data analyst’s toolkit. With logistic regression, you can classify your variables and derive insights about how they interact to make informed decisions and predictions. It is an important statistical analysis technique for social and natural sciences, with applications across many industries. Depending on your variable types, you can choose to perform binary, multinomial, or ordinal logistic regression.
Explore logistic regression, how the algorithm works, and how to find the regression type most suitable for your needs.
Logistic regression, in its most basic form, is binary logistic regression. Binary logistic regression is a statistical tool for predicting a binary outcome (two possible outcomes) based on the value of several variables. For example, you could predict whether someone will be healthy or sick at 80 years old based on their eating patterns, body weight, and health status.
Logistic regression is an algorithm that assesses the relationship between variables using existing data and then uses this relationship to predict future outcomes. For example, a logistic regression algorithm might find that in an existing data set, 95 percent of patients with a particular genetic mutation were diagnosed with diabetes. If the algorithm were then given a new set of patient data, it would predict that patients with the same genetic mutation would be highly likely to be diagnosed with diabetes in the future.
This type of regression models the probability of a particular class or event occurring, such as pass/fail, win/lose, or yes/no. The focus on probability and binary outcomes is what separates logistic regression from many other predictive models, such as linear regression. Depending on your needs, you can choose to expand beyond binary outcomes with more complex forms of logistic regression, including ordinal and multinomial logistic regression, which we will explore further in this article.
Logistic regression is a robust algorithm frequently used in machine learning and statistics to predict the probability of an outcome by fitting data to a logistic function. The process begins with a clearly defined research question aimed at predicting a particular outcome, such as determining the likelihood of rain impacting monthly sales or identifying a type of credit card activity.
Logistic regression models rely on historical data related to these outcomes, which are used to train the model. The independent variables in this model might be the number of rainy days in a month or specific credit card activity patterns, while the dependent variable could be the sales for that month or the type of credit card activity. After applying the logistic function, the result is an S-curve, or sigmoid function, that indicates the probability of the dependent variable occurring. This sigmoid function has the following format:
S(x) = 1/(1+e^(-z))
Essentially, we take the values of our input variables (predictors) and ask the question, which class of our output does this data point belong to? The model predicts that the data point belongs to that class if the probability is above a certain threshold, typically 0.5. This function allows us to predict the likelihood of an event given the values of the independent variables, making logistic regression an important tool in areas where understanding probabilities can drive impactful decision-making.
Logistic regression can handle numeric and categorical input variables and work with nonlinear relationships between predictors. On the output side, logistic regression provides a clear probabilistic outcome. You might find these functionalities beneficial in situations where knowing the probability of an event, and not just the predicted class, is important. Based on this, you might choose to use logistic regression in real-world scenarios such as:
Predicting whether someone will repay a loan
Predicting the likelihood of a medical outcome
Predicting the purchase a customer will make
Predicting the revenue for a company in a given month
Predicting the vacation location someone will choose
You can choose between several logistic regression types depending on your outcome variable.
You would choose to use binary logistic regression when the dependent variable—the outcome we're interested in predicting—can take only two possible values.
For instance, a bank might want to predict whether a loan applicant will default (1) or not default (0). The dependent variable, in this case, is binary. Binary outcomes can only take two possible values. The bank could use a range of independent variables, like income level, credit history, and age, to predict this outcome.
The binary logistic regression model would use these variables to predict the probability of an applicant defaulting on their loan. This could help the bank decide whether that person is a good candidate for a loan.
You can use binary logistic regression in many industries. For example, you may use it to predict whether a patient gets a disease, whether a buyer makes a purchase, or whether a student completes their degree.
You would choose multinomial logistic regression when you have more than two categories as outcomes, and these categories are unordered (your outcome variable is nominal). For example, you might predict whether a customer is more likely to buy a shirt, pants, or socks based on factors such as their age, location, and career choice.
Multinomial logistic regression uses a reference category to determine the probability of each outcome. For example, you might choose to buy a shirt (0) as your reference category and to buy a pair of pants (1) or socks (2) as comparison levels. Your equation could then tell you the probability of each outcome in relation to the baseline category. Similar to binary logistic regression, you can use this type of logistic regression across industries. For example, you might predict which disease a patient may develop, which food a customer will choose, or which career a person may decide to pursue.
You would choose ordinal logistic regression when the dependent variable is ordinal. An ordinal variable is a categorical variable with an order (or ranking) to the categories.
For instance, consider a survey that asks respondents to rate a product on a scale from one to five in order of least to most satisfied. In this case, not only are there more than two possible responses, but these responses also have a natural order. Someone who scores the product a four is more satisfied than someone who scores it a three.
Real-world examples of where you might find ordinal regression include which size of a beverage a customer purchases, how high a student ranks a class, which place a person scored in a sporting event, and so on.
Though both logistic and linear regression predict an outcome based on previous data, they cater to different needs. Linear regression works best when the outcome variable is continuous, and the relationship between variables is linear. This type of regression shows how an outcome variable’s value shifts based on alterations of the independent variables.
On the other hand, logistic regression is used when the outcome variable is categorical, and the relationship between variables isn’t strictly linear. Sometimes, you may categorize your continuous variable into groupings to conduct a logistic regression. This type of regression generally has discrete outcome values that can be binary, unordered categorical (ordinal), or ordered categorical (nominal).
One major benefit of logistic regression is that it can be used with nonlinear data. With linear regression, you predict how a variable increases or decreases based on changes in explanatory variables. With logistic regression, you can model complex relationships that do not rely on linearity, including classification models, which are extremely important in machine learning. Logistic regression can also process data at high speeds while remaining flexible to several types of research questions.
Another benefit of logistic regression is that it is considered less complex than other machine learning methods. Logistic regression is a discriminative classifier, which is simpler than a generative classifier, such as naive Bayes. In applications such as machine learning, logistic regression can perform tasks such as determining whether an image falls into a certain category. If you uploaded a batch of images of animals and wanted to separate them into “brown animals” and “not brown animals,” a logistic regression function could sort and classify these images.
Professionals in many industries use logistic regression, including health care, manufacturing, finance, and research. This means that regardless of your industry and interests, you can utilize logistic regression methods to examine the relationship between your variables. Rather than thinking about logistic regression as its own field, think of it as a method you can learn and then apply in your area of specialty.
To build skills in logistic regression, you can take various online courses and bootcamps to strengthen related skills in mathematics and statistics, as well as learn how to apply logistic regression in machine learning and data science fields.
Logistic regression is a common skill in careers using statistics and machine learning. By building skills in this area, you can help prepare yourself for an entry-level profession such as:
Data analyst
Statistician
Medical researcher
Marketer
Logistic regression is a predictive analytics model popular among professionals in science and mathematical fields. Depending on your variable types, you can choose different types of logistic regression, including binary, ordinal, and multinomial options.
To learn more about logistic regression and data analysis, take exciting courses on the Coursera learning platform. To start, choose from various courses that utilize different statistical software, such as Logistic Regression with NumPy and Python, Logistic Regression in R for Public Health, and Predictive Modeling with Logistic Regression using SAS.
Editorial Team
Coursera’s editorial team is comprised of highly experienced professional editors, writers, and fact...
This content has been made available for informational purposes only. Learners are advised to conduct additional research to ensure that courses and other credentials pursued meet their personal, professional, and financial goals.