Trending February 2024 # Most Useful Terminal Commands For Macos (2023 Updated) # Suggested March 2024 # Top 10 Popular

You are reading the article Most Useful Terminal Commands For Macos (2023 Updated) updated in February 2024 on the website We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested March 2024 Most Useful Terminal Commands For Macos (2023 Updated)

Terminal is a CLI (Command Line Interface), the language we type which interacts with the Mac. This tool is often overlooked as it is different from the GUI (Graphical User Interface), offering a rich interface. However, I’ve a useful list of macOS Terminal Commands that you can learn easily and do things instantly. Let’s start with the basics!

How to open Terminal on Mac

Spotlight is undoubtedly the easiest way to open Terminal on Mac. Follow the steps below to open Terminal.

Type Terminal in the Spotlight Search bar.

Now, let’s see how to get most of Terminal!

12 macOS Terminal commands to supercharge your Mac experience

1. Force your Mac to stay awake

It is annoying when your Mac goes to sleep when you are off for a short break. Of course, you can change Sleep Settings in System Preferences. However, it is easier to use Terminal to keep your Mac awake with the following command.


That’s not all! You can also add a timer to the command. Doing so will disable “caffeinate” mode after a preset time. You need to put the -t flag and specify the time in seconds, as shown below.

caffeinate -t 150000

In this case, your Mac will exit the mode after 15,000 seconds or 250 minutes. You can increase or decrease the timer by changing the number of seconds in the command.

2. Flush DNS cache

I have seldom faced issues with Mac’s DNS cache. Yet I would suggest you flush DNS cache on Mac every once in a while. Not doing so will lead to problems like loading websites and 404 errors. (add screenshot- macos-shell-flush-dns)

Enter the following command to flush DNS on Mac.

udo dscacheutil -flushcache; sudo killall -HUP mDNSResponder

Note: This command works only on macOS El Capitan and above. After flushing DNS, check if you can access the website with the issue.

3. Increase spacing between Dock apps

Too many apps in Dock triggering your OCD? Well, you can increase the spacing between each apps using terminal commands. Once done, your Dock will look neat and tidy.

defaults write chúng tôi persistent-apps -array-add '{tile-data={}; tile-type="spacer-tile";}'

And hit Return.

Once done, type: killall Dock and press Return again.

4. Change default screenshot name

Mac saves screenshot with date and time as default. The naming convention tends to look unprofessional. Worry not; once again macOS terminal comes to the rescue. You can change the default name for a screenshot using the below command.

defaults write name "New Screen Shot Name"

Now type:

killall SystemUIServer

5. Change default screenshot format

Now that you have fixed the screenshot naming, how about changing the format? macOS saves a screenshot in the PNG format. Some online portals like immigration require images to be in .jpg format.

Spending time to convert saved images to other formats is not ideal. Instead, you can use the terminal code to change the default image format. Furthermore, you can choose between Jpeg, TIFF, GIF, or even RAW (ideal for post-processing photos.)

defaults write type jpg

5. Download files without a browser

Want to download a file directly? With Terminal, you can download a file directly from the Internet. This method is useful only if you have a direct download link. Type the following command.

curl -O [URL of file you want to download]

6. Compress and password-protect folders

I recommend password-protecting sensitive data before sharing it with anyone. You can share the password separately with recipients. Using Terminal, you can compress and password-protect the folder. You need to navigate to Desktop and select the folder using the below command.

cd ~/Desktop/

Select the folder

Swap Output folder chúng tôi with the desired name. Specify the origin in the source folder name. In other words, you need to mention the origin and target file names. Interestingly you can also change the extension of the output file. Simply add an extension (e.g., .pdf) at the end of the above command.

7. Display hidden files and folders

macOS hides critical files. The fail-safe mechanism assures that you don’t delete a system file by mistake. Doing so could crash your Mac. However, the feature becomes a limitation whenever you want to view hidden files on an external drive.

The solution for this is the below Terminal command that lets you view hidden files.

defaults write AppleShowAllFiles -bool TRUE

Now you will get to see all the hidden files.


killall Finder

A word of caution: Don’t delete important system files. Before deleting anything, run a Google search. Use “False” instead of “True” in the above command to hide files again.

8. Access iCloud Drive using Terminal

To access the data from your iCloud Drive, use the following command.

cd ~/Library/Mobile Documents/com~apple~CloudDocs/

However, we already have a detailed guide on how you can access, copy, or move data to your iCloud Drive, which you can check out anytime.

9. Shut Down or restart Mac using Terminal.

To shut down your mac with CLI (Command Line Interface) aka Terminal, use.

sudo shutdown -h now

Just as shut down, you can restart your Mac by

sudo shutdown -r now

10. Supercharge Time Machine backup

Whenever you’re updating the Mac to the new version, backups are essential. The easiest way to take a backup is to use Time Machine. But do you know? You can speed up time machine backup using the terminal by this command.

sudo sysctl debug.lowpri_throttle_enabled=1

11. Copy contents from one folder to another

Copying contents from one place to another is fairly easy with Terminal. Type in the following command:

ditto -V ~/original/folder/ ~/new/folder/

Replace original with the current directory and new with the name of the directory to which you want to copy the contents.

12. Make your Mac say anything you want

This is the coolest command that macOS provides. You can make your Mac say anything you want by using the say command followed by the words.


Terminal commands ready reckoner

Impressed with all that you can do with the above Terminal Commands? We have curated and compiled the most useful macOS Terminal commands in the table below. Furthermore, the commands are segregated based on their utility.

That’s it!

These were some of the most common and useful Terminal commands. I hope this helped speed up your work and become a pro Mac user.

Additionally, if you are looking for some quick shortcuts to increase your productivity, check out our ebook on 200+ Mac keyboard shortcuts.

Author Profile


Mahit is an engineer by Education with a corporate stint to his name. He ditched the corporate boardroom wars in favor of the technology battleground. For the better part of a decade, he has worked for popular publishing outlets, including Dennis Publishing, BGR India, AppStorm, MakeUseOf, and iPhonehacks.

You're reading Most Useful Terminal Commands For Macos (2023 Updated)

10 Most Useful Microsoft Word Tips & Tricks

Microsoft Word is one of our favorite text editors. With such a big array of features, Microsoft Office Word can look complicated. There are many hidden tricks and shortcuts that make text editing easier. Here are some tips I think will help you when you are using Microsoft Word.

Microsoft Word Tips And Tricks 1. Vertical Selection Of Text

Normally, we select a character, a word, a sentence or a paragraph. All these selections are horizontal selections. Sometimes you may need to select vertically. For example, if your text has numbers in the beginning, you may want to select only the numbers to delete them at one go (see figure).

2. Default Line Spacing

The default line spacing in Microsoft Word is 1.15 against 1 in Microsoft Word 2003. Microsoft changed the line spacing to make your text more readable. If you want the default line spacing as 1, follow this procedure:

In the Format list that appears, select Paragraph

Under Spacing, change the line spacing from 1.15 to 1

Check the box against “New documents based on this template.”

3. Changing The Default Save Location

By default, MS Word opens Documents folder when you press CTRL+S for the first time. If you think this is irritating you, you can change the default file location to some other place where you normally store your documents.

In the right part of the window, scroll down to the button that says “File Locations”

4. Change Default Font

The default font for new documents in MS Word is Calibri. Though the font is good for online viewing, it creates problems when printing. You may be using Times New Roman or Arial for print jobs. One method is to change the font manually each time after you have typed the document. But then, it would involve formatting the document again. Another method is to change the default font.

In the Font dialog box, select the font you wish to use with every document.

Make any other changes you wish such as font size etc

5. Move Rows Of Text In Table

Sometimes when you are working on the table, you may want to move one or more rows in the table up or down without having to change the table formatting. One method is copy-pasting but that risks formatting.

Another method is using ALT+SHIFT+UP arrow key to move an entire row up. Similarly, to move the entire row down, use ALT+SHIFT+DN arrow key. Note that you have to select the row before you can move it using the ALT+SHIFT+Arrow keys. This method makes sure the formatting is not disturbed.

6. Quickly Change Line Spacing

Sometimes need arises that you have to change line spacing among different paragraphs. Here are the shortcut keys:

Note that you just need to place the cursor on the paragraph that needs to be styled. You need not select the paragraph.

7. Quickly Adding Borders to Paragraphs

If you wish to add borders to some paragraphs, you can use the Borders and Shading dialog box. However, if your need is just to add the bottom border to text/paragraph, you can do it by adding three special characters and hitting Enter.

Press – (hyphen) three times and press Enter to draw an underline border of 3/4 points

Press _ (underscore) three times and press Enter to draw an underline border of 1.5 points

Press ~ (tilde) three times and press Enter to draw a zigzag underline border

Press * (asterisk) three times and press Enter to draw a dotted underline border

Press = (equal to) three times and press Enter to draw a double underline border

8. Find Special Formatting

You can find text that is specially formatted. For example, you can find highlighted text or text whose font is Times New Roman. You can also search for bold text or italics. There are many more options when you use the Find option.

Press CTRL+F to open the Find pane. In Word it appears to the left side of the window.

You can see plenty of options under Format.

9. Merging Formatting When Pasting Across Documents

When you copy anything from another document and paste it in the current document, you will want the copied text to match the formatting of the current document. While you can manually format each time you copy text from other documents to the current one, you can also set the default paste to merge formatting so that the text copied from other sources acquires the formatting of the current document.

In the window that appears, Select Merge Destinations in 1] When Pasting in the same Document and 2] When Pasting between Documents.

10. Copy Only Formatting

Sometimes you may want to apply an already existing formatting from one part of your document to another part. You have the Format Painter for the purpose. Using the Format Painter can be irritating when dealing with long documents.  Here is another method that is easier to use.

Press CTRL+SHIFT+C instead of CTRL+C. This will copy only the formatting and leave the text.

Move to the destination where the formatting is to be applied. Select the text to which formatting is to be applied. Press CTRL+SHIFT+V to paste the formatting to the selection.

Top 14 Sdet Interview Questions And Answers {Updated For 2023}

Introduction to SDET Interview Questions and Answers

The following article provides an outline for SDET Interview Questions. SDET, Software Design Engineer in Test or Software Development Engineer in Test, stands for mainly testing performed on a software product. It needed some candidates who can able to develop and as well as perform testing. Microsoft initially started this, but currently, other organizations are very conscious of the same, and they are looking for someone who is an expert in SDET for involving in the full development of their product as well as involving with the testing design, which needs to be performed for that individual development. The organization can introduce the same resource in two key tasks that will always be profitable.

Start Your Free Software Development Course

If you are looking for a job related to SDET, you must prepare for the 2023 SDET Interview Questions. Every interview is indeed different as per the various job profiles. Here, we have prepared the important SDET Interview Questions and Answers to help you succeed in your interview. This 2023 SDET Interview Questions article will present the ten most important and frequently asked SDET interview questions.

These interview questions are divided into two parts as follows:

Part 1 – SDET Interview Questions (Basic)

This first part covers basic Interview Questions and Answers.

1. Explain differences in detail between software development engineering in test (SDET) and testing software manually?

SDET is mainly using doe automation testing. This means developing a product that can be tested automatically without manual intervention. Whereas manual testing does not at all meet these criteria.

2. Write a program to reverse a number in any language?


public class reverseNumber { public long reverse(long num) { long temp=0; while(num!=0) { temp=(temp*10)+(num%10); num=num/10; } return temp; } public static void main(String args[]) { long n= 654312; reverseNumber inp = new reverseNumber(); System.out.println(“Given number is “+ n); System.out.println(“Reverse of given number is “+inp.reverse(n)); } } 3. Explain how we can define ad-hoc testing in the current IT industry?

Ad hoc testing is one of the testings very much famous in the IT industry. This kind of testing is mainly unplanned and without documentation. It usually needs to perform when some ad hoc requirements come from the client; the developer has to develop in the same priority manner. Now tester needs to test it immediately and develop proper deliverables in a minimal period. Documentation or planning is not always possible, but some organizations maintain specific tools for tracking this task, especially for additional billing.

4. Two big keywords normally benefit the tester, one is the priority, and another is severity; explain the difference between them in detail.

Priority and severity are essential keywords in the IT industry, especially for those involved in the production support activity of their provided product or any client’s existing system. Currently, all the big organizations try to follow one specific tool where one helpdesk team has been assigned for handling. Typically, end-users reach that corresponding helpdesk team to raise their concerns, or end-users can create their concerns directly in that specific tool.

Some helpdesk person first analyzes the same, then gives the priority based on the end-user impact. A Helpdesk person, tester, developer, and some point-of-time business analyst are involved with that issue and try to understand the exact impact of that specific issue based on that they have given the severity of that issue. So priority defines how important that issue is, and severity is defined as the impact or destruction ability.

5. Explain a detailed explanation of the job responsibility of a tester or Software Development Engineering in a test role?

Write automation of testing and set up the same for varieties platforms like web or mobile.

Managing and handling bug reports.

Maintaining the proper communication channel between the developer and the client.

Preparing and delivering test cases.

6. What is ad-hoc testing?

Ad-hoc testing is defined as the testing done on an ad-hoc basis without any reference and proper inputs to the test case and without any plan, test cases, and documentation. This type of testing’s main objective is to find defects and break the application by executing different application flows or random functionality.

Ad-hoc testing is an informal way of finding bugs in an application and can be performed by anyone on the team. It will be difficult to find bugs without test cases, but sometimes during ad-hoc testing, bugs will find that we didn’t find through normal testing or existing test cases.

7. Give some examples regarding the typical experiences or excessive load working day of a tester or software development engineer in test (SDET) resources?

Three key tasks are always taken a huge amount of time for the tester on any day:

Understanding the requirements of the project.

Preparing and executing required test cases based on the client’s expected functionalities.

Reporting the bugs identified on individual functionality developed for the client to the developer and retesting the same after redelivery by the developer to ensure expected functionality is properly delivered without any common bug.

Part 2 – SDET Interview Questions (Advanced)

This is one critical decision, so a single person or junior guy has never taken it. Only the developer and tester are not involved in bringing this decision; higher management is periodically involved in that. Management test mainly ensure by validating below to ensure product delivery are bugless:

Validating bug reports provided by the tester. How was the bug resolved, and retesting done by the tester or not?

Validating all the test cases written by the tester for that specific functionality, documentation, and confirmation taken from the tester on the same.

Run automated test cases to ensure new functionalities do not break existing functionality.

Sometimes validating test coverage report ensures all the developing component has been covered by test cases written.

9. Write a program to swap two numbers without using any temp variable?

The program to swap two numbers without using any temp variable is as below:

public class swap{ public static void main (String args[]) { int x = 20; int y =30; System.out.println(“Numbers before swapping”); System.out.println(“ number x is “ + x); System.out.println(“number y is “ +y); x= x+y; y=x-y; x=x-y; System.out.println(“Numbers after swapping”); System.out.println(“ number x is “ + x); System.out.println(“number y is “ +y); } } 10. If someone needs one specific format of bug reports from a tester, then what will be the best way or approach can take by the tester to provide the same?

Bug Summary

Reproduce steps

Expected behavior and current behavior of one specific bug

11. Explain in detail about different kinds of testing called Alpha and Beta?

The tester does alpha testing identified bugs before moving the product to a live environment or the end-user. The determination of beta bugs typically falls to the end-user, who represents the actual user or application of the product.

12. What is Risk-Based testing?

Risk-Based testing refers to testing the functionalities of a product by prioritizing them according to the importance of the deliverables. Risk-Based testing includes testing crucial product features that will have a business impact, and the probability of the failure of those features is very high. Based on the business requirement, we prioritize product functionalities and test them in the order of high-priority, followed by medium and low-priority functionalities. Risk-based testing will be performed when there is insufficient time to test all the product’s functionalities.

13. Normally, there are different categories available to make one specific group by of varieties test cases; give an explanation of them.


Some famous test cases in the current IT industry are below:

Functional Testing

Frontend or User interface testing

Performance Testing

Integration Testing

Load testing or User usability testing

Security Testing

14. Common challenge one software tester commonly faces is proper documentation not maintained for testing. In that case, how can we overcome the same?

One common scenario involves inadequate availability of documentation for all types of test cases. However, fulfilling the requirement and delivering it to the client on time remains necessary. In such cases, testers typically follow client-provided emails that accurately describe all the requirements. In an ideal scenario, the emails would contain screenshots of the application, clearly indicating the specific areas that require changes. Alternatively, testers may engage in Monday meetings or conduct verbal discussions with the client to grasp the exact functionality required for the changes fully. This approach allows for efficient testing and timely delivery within the expected timeline.

Recommended Articles

This has been a guide to the SDET Interview Questions and Answers list so that the candidate can crack down on these Questions easily. In this post, we have studied the top SDET Interview Questions often asked in interviews. You may also look at the following articles to learn more –

Top 26 Scala Interview Questions And Answer Updated For 2023

Introduction to Scala Interview Questions And Answers

Scala is a general-purpose programing language providing support for functional programming and a robust static type system. Martin Ordersky designed it, and it first appeared on 20 January 2004. The file extension is scala or .sc. Scala combines object-oriented and functional programming in one concise, high-level language. Scala’s static types help avoids bugs in complex applications, and its JVM and JavaScript runtimes let you build high-performance systems with easy access to huge ecosystems of libraries. It runs on Java platforms.

Start Your Free Software Development Course

Web development, programming languages, Software testing & others


For Compiling:  scala HelloWorld.scala

Running: scala HelloWorld.

So if you are looking for a job related to Scala, you must prepare for the 2023 Scala Interview Questions. Though every Scala interview is different and the job scope is also different, we can help you with the top Scala Interview Questions and Answers, which will help you take the leap and get you to succeed in an interview.

Part 1 – Scala Interview Questions (Basic)

This first part covers basic Scala Interview Questions and Answers.

1. What is Scala?



Give some examples of JVM Language. Answer:

Java, Scala, Groovy, and Closure are very popular for JVM.

3. What is the superclass of all classes in Scala?


“Any” class is the superclass of all types in Scala.

4. What is the default access modifier in Scala?


“Public” is the default access modifier in Scala.

5. What is similar between Scala Int and Java’s java.lang.integer?


This first part covers basic Scala Interview Questions and Answers.

6. What is Null in Scala?


Null is a Type in Scala. It is available in the Scala package as “scala. Null”.

Let us move to the following Scala Interview Questions And Answers.

7. What is the Unit in Scala?


In Scala, a unit represents “No value” or “No Useful value.” In the package, it is defined as “scala. Unit”.

8. What are the value and var in Scala?


Var stands for variable, and Val stands for value. Var is used to define. Mutable variables and matter can be reassigned after the creation of it. Val is used to define Immutable variables, which means the value cannot be reassigned once it’s created.

9. What is REPL in Scala?


REPL stands for reading Evaluate Print Loop. Generally, we call it “Ripple.” It is an interpreter to execute scala code from the command prompt.


val year = if( count == 0) 2014 else 2024


There are two types of maps: Mutable and Immutable.

The closure is the function in scale where the returned value of the function depends on one or more than one variable, which is defined outside the function.

Part 2 – Scala Interview Questions (Advanced)


It is used for wrapping the missing value.


trait MyTrait {



lang, scala, scala.PreDef is the package in Scala.

Let us move to the next Scala Interview Questions And Answers.


Scala tuple is used to combine the fixed number of the item together. Nature vice the tuple are immutable and can hold objects of different types. For Eg: Val myTuple = (1, “element”, 10.2)


A Monad is an object in Scala which wraps another object.



Literal identifiers


Multi-Line Stings


Scala 2.12, which requires Java 8.

Let us move to the next Scala Interview Questions And Answers.


def keyword is used to define the function in Scala.


An object is a singleton instance of the class. It does not need to be initiated by the developer.


Akka is a concurrency framework in Scala that uses Actor based model for building JVM applications.


Scala compiler scalac to compile Scala Program and scala command to run it.

Recommended Articles

We hope that this EDUCBA information on “Scala Interview Questions” was beneficial to you. You can view EDUCBA’s recommended articles for more information.

World Holidays (An 2023 Updated List)

World Holidays

Several days in the year are marked important for many reasons. Some days are recognized internationally, and others have a national significance. We are going to update the list of world holidays over here.

Among the world’s important days, a few hours are observed as public holidays, while others are celebrated through events, ceremonies, conferences, and seminars. It is important to understand the significance of these days in their true essence, not just as world holidays.

Updated List of  World Holidays 2023

Watch our Demo Courses and Videos

Valuation, Hadoop, Excel, Mobile Apps, Web Development & many more.


New Year January 1

World Braille day January 4

Tamil Thai Pongal day January 15

Korean New Year January 22

Vietnamese New Year January 22

Chinese New Year January 22

Thaipusam February 4

Laila al Miraj February 18

Tibetan New Year February 21

Lalat al Bara’a March 7

Purim March 7

International Women’s Day March 8

Ramadan Begins March 23

Palm sunday April 2

Mahavir Jayanti April 4

Qing Ming Jie April 5

Good Friday April 7

Nuzul Quran April 8

Easter April 9

Easter Monday April 10

Burmeses New Year April 17

Ed al-Fitr April 21

International Labour Day May 1

Whit Monday May 29

Corpus Christi June 8

Dragon Boat Festival June 22

Eid al Adha June 28

Hijra New Year July 19

Parsi New Year August 16

Onam Aug 29

Jewish New Year September 16

Prophet’s Birthday September 27

Baptism of Prophet October 4

Dassain October 24

All Saint’s Day November 1

Christmas December 25

New Year’s Eve December 31

Days Recognized by United Nations #1 World Water Day

On March 22, World Water Day is observed to bring awareness to the conservation and rejuvenation of water for a sustainable world.

The theme for World Water Day 2023 is ‘Accelerating Change’, which changes every year to focus on the upcoming goals.

Sustainable Development Goal 6 mentions clean water and sanitation, which is essential for the survival of life on Earth.

We all know that water is crucial for our sustenance and survival; hence, conserving water becomes necessary, and you must not neglect it.

We must understand and create awareness towards conserving water so that the future generation is not left in drought and struggle for their survival on planet Earth. Central countries worldwide use this day as a platform to organize events, conferences, and seminars to create awareness; hence, this can be added to the list of world holidays.

#2 International Women’s Day

International Women’s Day falls every year on March 8, celebrating womanhood in its true spirit.

It celebrates women’s progress and development in all social, cultural, and political fields.

It also points towards the gap between males and females, which needs early intervention to accelerate the growth of this world.

Growth in financial, social, and political fields is only possible when women have the power to make crucial decisions in their homes and working spaces.

#3 International Human Rights Day

For the proper functioning of actual democracy, rights become crucial for any nation.

To spread awareness of the importance of rights in the lives of all individuals, this day is recognized by the UN on December 10 every year with a different theme.

The theme for 2023 has yet to be decided; however, for 2023, the theme was Dignity, Freedom, and Justice For All.

Many human rights activists worldwide organize rallies, host seminars, and stream virtual events to make people realize their rights.

#4 World Food Day

World Food Day celebrates the establishment of FAO, an organization of the UN.

Every year on October 16, World Food Day is observed, which targets food security and hunger across the globe. The celebration in many countries counts as a world holiday.

#5 World Cities Day

On October 31, the United Nations came up with this agenda to promote urbanization at an international level.

Today’s cities are overflowing with people who lack sanitary living conditions and access to safe drinking water.

This day highlights urbanization problems and seeks solutions to them.

These special days are observed as world holidays, encompassing a large population in their boundary. The aim these days recognized by the UN is to realize goals that benefit all.

Effective Strategies For Handling Missing Values In Data Analysis (Updated 2023)


If you are aiming for a job as a data scientist, you must know how to handle the problem of missing values, which is quite common in many real-life datasets. Incomplete data can bias the results of the machine learning models and/or reduce the accuracy of the model. This article describes missing data, how it is represented, and the different reasons data values get missed. Along with the different categories of missing data, it also details out different ways of handling missing values with dataset examples.

Learning Objectives

In this tutorial, we will learn about missing values and the benefits of missing data analysis in data science.

You will learn about the different types of missing data and how to handle them correctly.

You will also learn about the most widely used imputation methods to handle incomplete data.

What Is a Missing Value?

Missing data is defined as the values or data that is not stored (or not present) for some variable/s in the given dataset. Below is a sample of the missing data from the Titanic dataset. You can see the columns ‘Age’ and ‘Cabin’ have some missing values.

Source: analyticsindiamag

How Is a Missing Value Represented in a Dataset?

In the dataset, the blank shows the missing values.

In Pandas, usually, missing values are represented by NaN. It stands for Not a Number.

Source: medium

The above image shows the first few records of the Titanic dataset extracted and displayed using Pandas.

Why Is Data Missing From the Dataset?

There can be multiple reasons why certain values are missing from the data. Reasons for the missing of data from the dataset affect the approach of handling missing data. So it’s necessary to understand why the data could be missing.

Some of the reasons are listed below:

Past data might get corrupted due to improper maintenance.

Observations are not recorded for certain fields due to some reasons. There might be a failure in recording the values due to human error.

The user has not provided the values intentionally

Item nonresponse: This means the participant refused to respond.

Types of Missing Values

Formally the missing values are categorized as follows:

Source: theblogmedia

Missing Completely At Random (MCAR)

In MCAR, the probability of data being missing is the same for all the observations. In this case, there is no relationship between the missing data and any other values observed or unobserved (the data which is not recorded) within the given dataset. That is, missing values are completely independent of other data. There is no pattern.

Missing At Random (MAR)

MAR data means that the reason for missing values can be explained by variables on which you have complete information, as there is some relationship between the missing data and other values/data. In this case, the data is not missing for all the observations. It is missing only within sub-samples of the data, and there is some pattern in the missing values.

For example, if you check the survey data, you may find that all the people have answered their ‘Gender,’ but ‘Age’ values are mostly missing for people who have answered their ‘Gender’ as ‘female.’ (The reason being most of the females don’t want to reveal their age.)

So, the probability of data being missing depends only on the observed value or data. In this case, the variables ‘Gender’ and ‘Age’ are related. The reason for missing values of the ‘Age’ variable can be explained by the ‘Gender’ variable, but you can not predict the missing value itself.

Suppose a poll is taken for overdue books in a library. Gender and the number of overdue books are asked in the poll. Assume that most of the females answer the poll and men are less likely to answer. So why the data is missing can be explained by another factor, that is gender. In this case, the statistical analysis might result in bias. Getting an unbiased estimate of the parameters can be done only by modeling the missing data.

Missing Not At Random (MNAR)

Missing values depend on the unobserved data. If there is some structure/pattern in missing data and other observed data can not explain it, then it is considered to be Missing Not At Random (MNAR).

If the missing data does not fall under the MCAR or MAR, it can be categorized as MNAR. It can happen due to the reluctance of people to provide the required information. A specific group of respondents may not answer some questions in a survey.

For example, suppose the name and the number of overdue books are asked in the poll for a library. So most of the people having no overdue books are likely to answer the poll. People having more overdue books are less likely to answer the poll. So, in this case, the missing value of the number of overdue books depends on the people who have more books overdue.

Another example is that people having less income may refuse to share some information in a survey or questionnaire.

In the case of MNAR as well, the statistical analysis might result in bias.

Why Do We Need to Care About Handling Missing Data?

It is important to handle the missing values appropriately.

Many machine learning algorithms fail if the dataset contains missing values. However, algorithms like K-nearest and Naive Bayes support data with missing values.

You may end up building a biased machine learning model, leading to incorrect results if the missing values are not handled properly.

Missing data can lead to a lack of precision in the statistical analysis.

Practice Problem

Let’s take an example of the Loan Prediction Practice Problem from Analytics Vidhya. You can download the dataset from the following link.

Checking for Missing Values in Python

The first step in handling missing values is to carefully look at the complete data and find all the missing values. The following code shows the total number of missing values in each column. It also shows the total number of missing values in the entire data set.

From the above output, we can see that there are 6 columns – Gender, Married, Dependents, Self_Employed, LoanAmount, Loan_Amount_Term, and Credit_History having missing values.

IN: #Find the total number of missing values from the entire dataset train_df.isnull().sum().sum() OUT: 149

There are 149 missing values in total.

List of Methods to handle missing values in a dataset

Here is a list of popular strategies to handle missing values in a dataset

Deleting the Missing Values

Imputing the Missing Values

Imputing the Missing Values for Categorical Features

Imputing the Missing Values using Sci-kit Learn Library

Using “Missingness” as a Feature

Handling Missing Values

Now that you have found the missing data, how do you handle the missing values?

Analyze each column with missing values carefully to understand the reasons behind the missing of those values, as this information is crucial to choose the strategy for handling the missing values.

There are 2 primary ways of handling missing values:

Deleting the Missing values

Imputing the Missing Values

Deleting the Missing value

Generally, this approach is not recommended. It is one of the quick and dirty techniques one can use to deal with missing values. If the missing value is of the type Missing Not At Random (MNAR), then it should not be deleted.

If the missing value is of type Missing At Random (MAR) or Missing Completely At Random (MCAR) then it can be deleted (In the analysis, all cases with available data are utilized, while missing observations are assumed to be completely random (MCAR) and addressed through pairwise deletion.)

There are 2 ways one can delete the missing data values:

Deleting the entire row (listwise deletion)

If a row has many missing values, you can drop the entire row. If every row has some (column) value missing, you might end up deleting the whole data. The code to drop the entire row is as follows:

IN: df = train_df.dropna(axis=0) df.isnull().sum() OUT: Loan_ID 0 Gender 0 Married 0 Dependents 0 Education 0 Self_Employed 0 ApplicantIncome 0 CoapplicantIncome 0 LoanAmount 0 Loan_Amount_Term 0 Credit_History 0 Property_Area 0 Loan_Status 0 dtype: int64

Deleting the entire column

If a certain column has many missing values, then you can choose to drop the entire column. The code to drop the entire column is as follows:

IN: df = train_df.drop(['Dependents'],axis=1) df.isnull().sum() OUT: Loan_ID 0 Gender 13 Married 3 Education 0 Self_Employed 32 ApplicantIncome 0 CoapplicantIncome 0 LoanAmount 22 Loan_Amount_Term 14 Credit_History 50 Property_Area 0 Loan_Status 0 dtype: int64 Imputing the Missing Value

There are many imputation methods for replacing the missing values. You can use different python libraries such as Pandas, and Sci-kit Learn to do this. Let’s go through some of the ways of replacing the missing values.

Replacing with an arbitrary value

If you can make an educated guess about the missing value, then you can replace it with some arbitrary value using the following code. E.g., in the following code, we are replacing the missing values of the ‘Dependents’ column with ‘0’.

IN: #Replace the missing value with '0' using 'fiilna' method train_df['Dependents'] = train_df['Dependents'].fillna(0) train_df[‘Dependents'].isnull().sum() OUT: 0

Replacing with the mean

This is the most common method of imputing missing values of numeric columns. If there are outliers, then the mean will not be appropriate. In such cases, outliers need to be treated first. You can use the ‘fillna’ method for imputing the columns ‘LoanAmount’ and ‘Credit_History’ with the mean of the respective column values.

IN: #Replace the missing values for numerical columns with mean train_df['LoanAmount'] = train_df['LoanAmount'].fillna(train_df['LoanAmount'].mean()) train_df['Credit_History'] = train_df[‘Credit_History'].fillna(train_df['Credit_History'].mean()) OUT: Loan_ID 0 Gender 13 Married 3 Dependents 15 Education 0 Self_Employed 32 ApplicantIncome 0 CoapplicantIncome 0 LoanAmount 0 Loan_Amount_Term 0 Credit_History 0 Property_Area 0 Loan_Status 0 dtype: int64

Replacing with the mode

Mode is the most frequently occurring value. It is used in the case of categorical features. You can use the ‘fillna’ method for imputing the categorical columns ‘Gender,’ ‘Married,’ and ‘Self_Employed.’

IN: #Replace the missing values for categorical columns with mode train_df['Gender'] = train_df['Gender'].fillna(train_df['Gender'].mode()[0]) train_df['Married'] = train_df['Married'].fillna(train_df['Married'].mode()[0]) train_df['Self_Employed'] = train_df[‘Self_Employed'].fillna(train_df['Self_Employed'].mode()[0]) train_df.isnull().sum() OUT: Loan_ID 0 Gender 0 Married 0 Dependents 0 Education 0 Self_Employed 0 ApplicantIncome 0 CoapplicantIncome 0 LoanAmount 0 Loan_Amount_Term 0 Credit_History 0 Property_Area 0 Loan_Status 0 dtype: int64

Replacing with the median

The median is the middlemost value. It’s better to use the median value for imputation in the case of outliers. You can use the ‘fillna’ method for imputing the column ‘Loan_Amount_Term’ with the median value.

train_df['Loan_Amount_Term']= train_df['Loan_Amount_Term'].fillna(train_df['Loan_Amount_Term'].median())

Replacing with the previous value – forward fill

In some cases, imputing the values with the previous value instead of the mean, mode, or median is more appropriate. This is called forward fill. It is mostly used in time series data. You can use the ‘fillna’ function with the parameter ‘method = ffill’

IN: import pandas as pd import numpy as np test = pd.Series(range(6)) test.loc[2:4] = np.nan test OUT: 0 0.0 1 1.0 2 Nan 3 Nan 4 Nan 5 5.0 dtype: float64 IN: # Forward-Fill test.fillna(method=‘ffill') OUT: 0 0.0 1 1.0 2 1.0 3 1.0 4 1.0 5 5.0 dtype: float64

Replacing with the next value – backward fill

In backward fill, the missing value is imputed using the next value.

IN: # Backward-Fill test.fillna(method=‘bfill') OUT: 0 0.0 1 1.0 2 5.0 3 5.0 4 5.0 5 5.0 dtype: float64


Missing values can also be imputed using interpolation. Pandas’ interpolate method can be used to replace the missing values with different interpolation methods like ‘polynomial,’ ‘linear,’ and ‘quadratic.’ The default method is ‘linear.’

IN: test.interpolate() OUT: 0 0.0 1 1.0 2 2.0 3 3.0 4 4.0 5 5.0 dtype: float64 How to Impute Missing Values for Categorical Features?

There are two ways to impute missing values for categorical features as follows:

Impute the Most Frequent Value

We will use ‘SimpleImputer’ in this case, and as this is a non-numeric column, we can’t use mean or median, but we can use the most frequent value and constant.

IN: import pandas as pd import numpy as np X = pd.DataFrame({'Shape':['square', 'square', 'oval', 'circle', np.nan]}) X Shape OUT: 0 square 1 square 2 oval 3 circle 4 NaN IN: from sklearn.impute import SimpleImputer imputer = SimpleImputer(strategy='most_frequent') imputer.fit_transform(X) OUT: array([['square'], ['square'], ['oval'], ['circle'], ['square']], dtype=object)

As you can see, the missing value is imputed with the most frequent value, ’square.’

Impute the Value “Missing”

We can impute the value “missing,” which treats it as a separate category.

IN: imputer = SimpleImputer(strategy='constant', fill_value='missing') imputer.fit_transform(X) OUT: array([['square'], ['square'], ['oval'], ['circle'], ['missing']], dtype=object)

In any of the above approaches, you will still need to OneHotEncode the data (or you can also use another encoder of your choice). After One Hot Encoding, in case 1, instead of the values ‘square,’ ‘oval,’ and’ circle,’ you will get three feature columns. And in case 2, you will get four feature columns (4th one for the ‘missing’ category). So it’s like adding the missing indicator column in the data. There is another way to add a missing indicator column, which we will discuss further.

How to Impute Missing Values Using Sci-kit Learn Library?

We can impute missing values using the sci-kit library by creating a model to predict the observed value of a variable based on another variable which is known as regression imputation.

Univariate Approach

In a Univariate approach, only a single feature is taken into consideration. You can use the class SimpleImputer and replace the missing values with mean, mode, median, or some constant value.

Let’s see an example:

IN: import numpy as np from sklearn.impute import SimpleImputer imp = SimpleImputer(missing_values=np.nan, strategy='mean')[[1, 2], [np.nan, 3], [7, 6]]) OUT: SimpleImputer() IN: X = [[np.nan, 2], [6, np.nan], [7, 6]] print(imp.transform(X)) OUT: [[4. 2. ] [6. 3.666...] [7. 6. ]] Multivariate Approach

In a multivariate approach, more than one feature is taken into consideration. There are two ways to impute missing values considering the multivariate approach. Using KNNImputer or IterativeImputer classes.

Let’s take an example of a titanic dataset.

Suppose the feature ‘age’ is well correlated with the feature ‘Fare’ such that people with lower fares are also younger and people with higher fares are also older. In that case, it would make sense to impute low age for low fare values and high age for high fare values. So here, we are taking multiple features into account by following a multivariate approach.

IN: import pandas as pd cols = ['SibSp', 'Fare', 'Age'] X = df[cols] X


IN: from sklearn.experimental import enable_iterative_imputer from sklearn.impute import IterativeImputer impute_it = IterativeImputer() impute_it.fit_transform(X) OUT: array([[ 1. , 7.25 , 22. ], [ 1. , 71.2833 , 38. ], [ 0. , 7.925 , 26. ], [ 1. , 53.1 , 35. ], [ 0. , 8.05 , 35. ], [ 0. , 8.4583 , 28.50639495]])

Let’s see how IterativeImputer works. For all rows in which ‘Age’ is not missing, sci-kit learn runs a regression model. It uses ‘Sib sp’ and ‘Fare’ as the features and ‘Age’ as the target. And then, for all rows for which ‘Age’ is missing, it makes predictions for ‘Age’ by passing ‘Sib sp’ and ‘Fare’ to the training model. So it actually builds a regression model with two features and one target and then makes predictions on any places where there are missing values. And those predictions are the imputed values.

Nearest Neighbors Imputations (KNNImputer)

Missing values are imputed using the k-Nearest Neighbors approach, where a Euclidean distance is used to find the nearest neighbors. Let’s take the above example of the titanic dataset to see how it works.

IN: from sklearn.impute import KNNImputer impute_knn = KNNImputer(n_neighbors=2) impute_knn.fit_transform(X) OUT: array([[ 1. , 7.25 , 22. ], [ 1. , 71.2833, 38. ], [ 0. , 7.925 , 26. ], [ 1. , 53.1 , 35. ], [ 0. , 8.05 , 35. ], [ 0. , 8.4583, 30.5 ]])

In the above example, the n_neighbors=2. So sci-kit learn finds the two most similar rows measured by how close the ‘Sib sp’ and ‘Fare’ values are to the row which has missing values. In this case, the last row has a missing value. And the third row and the fifth row have the closest values for the other two features. So the average of the ‘Age’ feature from these two rows is taken as the imputed value.

How to Use “Missingness” as a Feature?

In some cases, while imputing missing values, you can preserve information about which values were missing and use that as a feature. This is because sometimes, there may be a relationship between the reason for missing values (also called the “missingness”) and the target variable you are trying to predict. In such cases, you can add a missing indicator to encode the “missingness” as a feature in the imputed data set.

Where can we use this?

Suppose you are predicting the presence of a disease. Now, imagine a scenario where a missing age is a good predictor of the disease because we don’t have records for people in poverty. The age values are not missing at random. They are missing for people in poverty, and poverty is a good predictor of disease. Thus, missing age or “missingness” is a good predictor of disease.

IN: import pandas as pd import numpy as np X = pd.DataFrame({'Age':[20, 30, 10, chúng tôi 10]}) X


IN: from sklearn.impute import SimpleImputer # impute the mean imputer = SimpleImputer() imputer.fit_transform(X) OUT: array([[20. ], [30. ], [10. ], [17.5], [10. ]]) IN: imputer = SimpleImputer(add_indicator=True) imputer.fit_transform(X) OUT: array([[20. , 0. ], [30. , 0. ], [10. , 0. ], [17.5, 1. ], [10. , 0. ]])

In the above example, the second column indicates whether the corresponding value in the first column was missing or not. ‘1’ indicates that the corresponding value was missing, and ‘0’ indicates that the corresponding value was not missing.

If you don’t want to impute missing values but only want to have the indicator matrix, then you can use the ‘MissingIndicator’ class from scikit learn.


Key Takeaways

It is critical to reduce the potential bias in the machine learning models and get a precise statistical analysis of the data. Handling missing values is one of the challenges of data analysis.

Understanding different categories of missing data help in making decisions on how to handle it. We explored different categories of missing data and the different ways of handling it in this article.

Frequently Asked Questions

Q1. What are the types of missing values in data?

A. The three types of missing data are Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR).

Q2. How do you handle missing values?

A. We can use different methods to handle missing data points, such as dropping missing values, imputing them using machine learning, or treating missing values as a separate category.

Q3. How does pairwise deletion handle missing data?

A. Pairwise deletion is a method of handling missing values where only the observations with complete data are used in each pairwise correlation or regression analysis. This method assumes that the missing data is MCAR, and it is appropriate when the missing data is not too large.

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion.


blogathonHandling Missing ValuesMissing Datamissing value imputationMissing Values Treatment

Recommended For You

Update the detailed information about Most Useful Terminal Commands For Macos (2023 Updated) on the website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!