.

Wednesday, June 5, 2019

Application of Regression Analysis

Application of Regression AnalysisChapter-3MethodologyIn the application of degeneration analysis, often the data set inhabit of un habitual observations which are either outliers (noise) or influential observations. These observations may have large proportions and affect the debates of the regression co-efficient and the whole regression analysis and become the root of misleading results and interpretations. Therefore it is very important to consider these suspected observations very carefully and made a decision that either these observations should be included or removed from the analysis.In regression analysis, the basic step is to determine whether unmatched or more observations digest influence the results and interpretations of the analysis. If the regression analysis have one independent variable, then it is easy to detect observations in dependent and independent variables by using scatter plot, box plot and respite plot and so forth But graphical method to identify outlier and/or influential observation is a subjective advancement. It is also well known that in the presence of sevenfold outliers there cannister be a masking or swamping effect. Masking (false negative) occurs when an outlying subset remains undetected due the presence of another, usually adjacent subset. Swamping (false positive) occurs when usual observation is incorrectly determine as outlier in the presence of another usually remote subset of observations.In the present study, some well known diagnostics are compared to identify multiple influential observations. For this purpose, first, robust regression methods are use to identify influential observation in Poisson regression, then to conform that the observations identified by robust regression method are genuine influential observations, some diagnostic measures ground on single slickness deletion approach like Pearson chi-square, deflexion residual, hat matrix, likelihood residual test, cooks withdrawnness, di fference of fits, squared difference in beta are considered but in the presence of masking and swamping diagnostics based on single case deletion fail to identify outlier and influential observations. Therefore to remove or sully the masking and swamping phenomena some group deletion approaches generalize standardise Pearson residual, generalized difference of fits, generalized squared difference in beta are taken. 3.2 diagnostic measures based on single case deletionThis section presents the detail of single case deleted measures which are used to identify multiple influential observations in Poisson regression model. These measures are change in Pearson chi-square, change in deviance, hat matrix, likelihood residual test, cooks distance, difference of fits (DFFITS),squared difference in beta(SDBETA).Pearson chi-squareTo test the amount of change in Poisson regression estimates that would occurred if the kth observation is deleted, Pearson 2 statistic is proposed to detect the outlier. Such diagnostic statistics are one that examine the effected of deleting single case on the overall summary measures of fit.Let denotes the Pearson 2 and denotes the statistic after the case k is deleted. Using one-step linear approximations given by Pregibon (1981). The decrease in the order of statistics due to deletion of the kth case is = - , k=1,2,3,..,n 3.1 is defined as 3.2 = And for the kth deleted case is = 3.3Deviance residualThe one-step linear approximation for change in deviance when the kth case is deleted isD = D - D(-k) 3.4Because the deviance is used to measure the goodness of fit of a model, a substantial decrease in the deviance after the deletion of the kth observation is indicate that is observation is a misfit. The deviance of Poisson regression with kth observation isD=2 3.5Where = exp (D(-k)= 2 3.6A larger lever of D(-k) indicates that the kth value is an outlier.Hat matrixThe Hat matrix is used in residual diagnostics to measure the infl uence of each observation. The hat values, hii, are the chance event entries of the Hat matrix which is calculated usingH=V1/2X(XTVX)-1XTV1/2 3.7Where V=diagvar(yi)(ii)-1 var(yi)=E(yi)= In Poisson regression model=i) = (,where g lean is usually called the link function and With the log link in Poisson regressioni= =V=diag( 3.8(XTVX)-1 is an estimated covariance matrix of and hii is the ith diagonal element of Hat matrix H. The properties of the diagonal element of hat matrix i.e leverage values are0and Where k indicates the parameter of the regression model with intercept term. An observation is said to be influential if ckn. where c is a suitably constant 2 and 3 or more. Using twice the mean thumb rule suggested by Hoaglin and Welsch (1978), an observation with 2kn considered as influential.Likelihood residual testFor the detection of outliers, Williams (1987) introduced the likelihood residual. The squared likelihood residual is a weighted average of the squared standardized deviance and Pearson residual is defined as 3.9and it is approximately equals to likelihood ratio test for testing whether an observation is an outlier and it also called approximate studentized residual, is standardized Pearson residual is defined as= 3.10 is standardized deviance residual is defined as= 3.11 = sign(Where is called the deviance residual and it is another popular residual because the sum of square of these residual is a deviance statistic.Because the average value, KN, of hi is gauzy is much closer to than to ,and therefore also approximately normally distributed. An observation is considered to be influential if t(1, nDifference of fits test (DFFITS)Difference of fits test for Poisson regression is defined as(DFFITS)i= , i=1,2,3,..,n 3.12Where and are respectively the ith fitted response and an estimated standard error with the ith observation is deleted. DFFITS can be expressed in terms of standardized Pearson residuals and leverage values as(DFFITS)i= 3.13 = =An observation is said to be influential if the value of DFFITS 2.Cooks DistanceCook (1977) suggests the statistics which measures the change in parameter estimates caused by deleting each observation, and defined asCDi= 3.14Where is estimated parameter of without ith observation. There is also a relationship between difference of fits test and Cooks distance which can be expressed asCDi= 3.15Using approximation suggested by Pregibons C.D can be expressed as () 3.16Observation with CD value greater than 1 is treated as an influential.Squared Difference in important (SDFBETA)The measure is originated from the idea of Cooks distance (1977) based on single case deletion diagnostic and brings a modification in DFBETA (Belsley et al., 1980), and it is defined as(SDFBETA)i = 3.17After some necessary calculation SDFBETA can be relate with DFFITS as(SDFBETA)i = 3.18The ith observation is influential if (SDFBETA)iDiagnostic measures based on group deletion approachThis secti on includes the detail of group deleted measures which are used to identify the multiple influential observations in Poisson regression model. Multiple influential observations can misfit the data and can create the masking or swamping effect. Diagnostics based on group deletion are effective for identification of multiple influential observations and are free from masking and swamping effect in the data. These measures are generalized standardized Pearson residual (GSPR), generalized difference of fits (GDFFITS) and generalized squared difference in Beta(GSDFBETA).3.3.1 Generalized standardized Pearson residual (GSPR)Imon and Hadi (2008) introduced GSPR to identify multiple outliers and it is defined as i 3.19= i 3.20Where are respectively the diagonal elements of V and H (hat matrix) of remaining group. Observations corresponding to the cases GSPR 3 are considered as outliers.3.3.2 Generalized difference of fits (GDFFITS)GDFFITS statistic can be expressed in terms of GSPR (Gene ralized standardized Pearson residual) and GWs (generalized weights).GWs is denoted by and defined as for i 3.21= for i 3.22A value having is larger than, Median (MAD ( is considered to be influential i.e Median (MAD (Finally GDFFITS is defined as(GDFFITS)i= 3.23We consider the observation as influential ifGDFFITSi 33.3.3 Generalized squared difference in Beta (GSDFBETA)In order to identify the multiple outliers in dataset and to overcome the masking and swamping effect GSDFBETA is defined asGSDFBETAi = for i 3.24= for i 3.25Now the generalized GSDFBETA can be re-expressed in terms of GSPR and GWsGSDFBETAi = for i 3.26= for i 3.27A suggested cut-off value for the detection of influential observation isGSDFBETA

No comments:

Post a Comment