{"id":265843,"date":"2024-08-30T17:55:13","date_gmt":"2024-08-30T17:55:13","guid":{"rendered":"https:\/\/imarticus.org\/blog\/?p=265843"},"modified":"2024-08-30T17:55:13","modified_gmt":"2024-08-30T17:55:13","slug":"missing-value-treatment","status":"publish","type":"post","link":"https:\/\/imarticus.org\/blog\/missing-value-treatment\/","title":{"rendered":"Understanding Missing Values: Types, Causes, and Impacts on Data Analysis"},"content":{"rendered":"<p><span style=\"font-weight: 400;\">Missing values in data analysis&#8221; refers to values or data that are missing from a given dataset or are not recorded for a certain variable. In this post, we will take a voyage through the complex terrain of handling missing data, a critical part of data pre-processing that requires accuracy and imagination. We&#8217;ll learn about the causes and types of missingness, as well as missing value treatment.<\/span><\/p>\n<h2><span style=\"font-weight: 400;\">Common Causes of Missing Values in Data Analysis<\/span><span style=\"font-weight: 400;\"><br \/>\n<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Missing data impacts all data-related professions and can lead to a number of challenges such as lower performance, data processing difficulties, and biassed conclusions as a result of discrepancies between complete and missing information. Some of the probable causes of missing data are:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Human errors during data collection and entry<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Equipment or software malfunctions causing machine errors;<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Participant drop-outs from the study<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Respondents refusing to answer certain questions<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Study duration and nature<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Data transmission and conversion<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Integrating unrelated datasets<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">Frequent missingness has the ability to reduce overall statistical power and introduce biases into estimates. The relevance of missing values is determined by the magnitude of the missing data, its pattern, and the process that caused it. Therefore, a strategy is always necessary when dealing with missing data, as poor management might produce significantly biassed study results and lead to inaccurate conclusions.<\/span><\/p>\n<h2><span style=\"font-weight: 400;\">Various Types of Missing Values in Data Analysis and the Impacts<\/span><\/h2>\n<h3><span style=\"font-weight: 400;\">MCAR or Missing Completely at Random<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">In MCAR, missingness has no relationship with either observed or unobserved values in the dataset. Simply put, the lack of data occurs at random, with no clear pattern.\u00a0<\/span><\/p>\n<p><span style=\"font-weight: 400;\">A classic example of MCAR occurs when a survey participant inadvertently misses a question. The chance of data being absent is independent of any other information in the dataset. This approach is regarded the best for data analysis since it introduces no bias.<\/span><\/p>\n<h3><span style=\"font-weight: 400;\">MAR or Missing at Random<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">In MAR, the missingness may be explained by some of the observable dataset properties. Although the data is missing systematically, it is still deemed random since the missingness has no relationship to the unobserved values.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">For example, in tobacco research, younger individuals may report their values less frequently (independent of their smoking status), resulting in systematic missingness due to age.<\/span><\/p>\n<h3><span style=\"font-weight: 400;\">MNAR: Missing Not at Random<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">MNAR happens when the missingness is linked to the unobserved data. In this situation, the missing data is not random but rather linked to particular reasons or patterns.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Referring to the tobacco research example, individuals who smoke the most may purposefully conceal their smoking habits, resulting in systemic missingness due to missing data.<\/span><\/p>\n<h2><span style=\"font-weight: 400;\">Treatment of Missing Values: Approach for Handling<\/span><\/h2>\n<p><span style=\"font-weight: 400;\">Three commonly utilised approaches to address missing data include:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Deletion method<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Imputation method<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Model-based method<\/span><\/li>\n<\/ul>\n<p><span style=\"font-weight: 400;\">All these methods can be further categorised.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">Furthermore, choosing the right treatment will depend on several factors:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Type of missing data: MCAR, MAR, or MNAR<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Missing value proportion<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Data type and distribution<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Analytical objectives and assumptions<\/span><\/li>\n<\/ul>\n<h2><span style=\"font-weight: 400;\">Implications\/Impacts Various Missing Data<\/span><\/h2>\n<p><b>MCAR:<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">MCAR data can be handled efficiently with the help of simple methods such as listwise deletion or mean imputation, without compromising the integrity of the analysis;<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Statistical results originating from MCAR data are usually unbiased and reliable.<\/span><\/li>\n<\/ul>\n<p><b>MAR:<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">MAR data requires more intricate handling techniques such as multiple imputation or maximum likelihood estimation;<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Failing to account for MAR in a proper manner may introduce biases and affect the validity of statistical analyses.<\/span><\/li>\n<\/ul>\n<p><b>MNAR:<\/b><\/p>\n<ul>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">MNAR data is the most difficult one to handle, as the reasons for missingness are not captured within the observed data;<\/span><\/li>\n<li style=\"font-weight: 400;\" aria-level=\"1\"><span style=\"font-weight: 400;\">Traditional imputation methods may not be applicable for MNAR data, and specialised techniques are required that would consider the reasons for missingness.<\/span><\/li>\n<\/ul>\n<h3><span style=\"font-weight: 400;\">Final Words<\/span><\/h3>\n<p><span style=\"font-weight: 400;\">Understanding the factors that cause missing data is critical for any data scientist or analyst. Each mechanism &#8211; MCAR, MAR, and MNAR &#8211; has particular challenges and consequences for data processing.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">As data scientists, it is critical to determine the proper process and apply appropriate imputation or handling procedures. Failure to treat missing data appropriately can jeopardise the integrity of analysis and lead to incorrect results. Missing data&#8217;s influence can be reduced by using proper strategies.<\/span><\/p>\n<p><span style=\"font-weight: 400;\">To learn more about data science and analytics concepts, enrol into the <\/span><a href=\"https:\/\/imarticus.org\/postgraduate-program-in-data-science-analytics\/?utm_source=google&amp;utm_medium=cpc&amp;utm_campaign=14988268718&amp;utm_campaignname=Imarticus%20BRAND%20-%20Delhi&amp;utm_term=data%20science%20course%20imarticus&amp;utm_adgroup=imarticus_DataScience&amp;utm_campaigntype=search&amp;gad_source=1&amp;gclid=CjwKCAjwuMC2BhA7EiwAmJKRrG7jb3bFE7TIOynve81qHHTlgyGQ-nCE0GNAtnOqsO6xBy8HDTR_3BoCoj8QAvD_BwE\"><span style=\"font-weight: 400;\">data science course<\/span><\/a><span style=\"font-weight: 400;\"> by Imarticus.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Missing values in data analysis&#8221; refers to values or data that are missing from a given dataset or are not recorded for a certain variable. In this post, we will take a voyage through the complex terrain of handling missing data, a critical part of data pre-processing that requires accuracy and imagination. We&#8217;ll learn about [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":265844,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"_mo_disable_npp":"","_lmt_disableupdate":"","_lmt_disable":"","footnotes":""},"categories":[4528],"tags":[],"class_list":["post-265843","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-data-science-and-alayitcs"],"acf":[],"aioseo_notices":[],"modified_by":"Imarticus Learning","_links":{"self":[{"href":"https:\/\/imarticus.org\/blog\/wp-json\/wp\/v2\/posts\/265843","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/imarticus.org\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/imarticus.org\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/imarticus.org\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/imarticus.org\/blog\/wp-json\/wp\/v2\/comments?post=265843"}],"version-history":[{"count":1,"href":"https:\/\/imarticus.org\/blog\/wp-json\/wp\/v2\/posts\/265843\/revisions"}],"predecessor-version":[{"id":265845,"href":"https:\/\/imarticus.org\/blog\/wp-json\/wp\/v2\/posts\/265843\/revisions\/265845"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/imarticus.org\/blog\/wp-json\/wp\/v2\/media\/265844"}],"wp:attachment":[{"href":"https:\/\/imarticus.org\/blog\/wp-json\/wp\/v2\/media?parent=265843"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/imarticus.org\/blog\/wp-json\/wp\/v2\/categories?post=265843"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/imarticus.org\/blog\/wp-json\/wp\/v2\/tags?post=265843"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}