In this question we will explore and show some nice properties of Generalized Linear Models, specifically...
Fantastic news! We've Found the answer you've been seeking!
Question:
Transcribed Image Text:
In this question we will explore and show some nice properties of Generalized Linear Models, specifically those related to its use of Exponential Family distributions to model the output. Most commonly, GLMs are trained by using the negative loglikelihood (NLL) as the loss function. This is mathemat ically equivalent to Maximum Likelihood Estimation (i.e., maximizing the loglikelihood is equivalent to minimizing the negative loglikelihood). In this problem, our goal is to show that the NLL loss of a GLM is a conver function w.r.t the model parameters. As a reminder, this is convenient because a convex function is one for which any local minimum is also a global minimum, and there is extensive research on how to optimize various types of convex functions efficiently with various algorithms such as gradient descent or stochastic gradient descent. To recap, an exponential family distribution is one whose probability density can be represented p(y;n) = b(y) exp(n¹T(y)  a(n)), where n is the natural parameter of the distribution. Moreover, in a Generalized Linear Model, nis modeled as Tx, where x Rd are the input features of the example, and 0 € Rª are learnable parameters. In order to show that the NLL loss is convex for GLMs, we break down the process into subparts, and approach them one at a time. Our approach is to show that the second derivative (i.e., Hessian) of the loss w.r.t the model parameters is Positive SemiDefinite (PSD) at all values of the model parameters. We will also show some nice properties of Exponential Family distributions as intermediate steps. For the sake of convenience we restrict ourselves to the case where n is a scalar. Assume p(YX; 0) ~ ExponentialFamily(n), where n E R is a scalar, and T(y) = ) = y. This makes the exponential family representation take the form p(y;n)  b(y) exp(ny  a(n)). (a) [6 points (Written)] Derive an expression for the mean of the distribution. Show that E[Y; n] = a(n) (note that E[Y; n] = E[YX;0] since n=0x). In other words, show that the mean of an exponential family distribution is the first derivative of the logpartition function with respect to the natural parameter. Hint: Start with observing that fp(y;n)dy = fp(y;n)dy. (b) [6 points (Written)] Next, derive an expression for the variance of the distribution. In particular, show that Var(Y;n) = a(n) (again, note that Var(Y; n) = Var(YX; 0)). In other words, show that the variance of an exponential family distribution is the second derivative of the logpartition function w.r.t. the natural parameter. Hint: Building upon the result in the previous subproblem can simplify the derivation. (c) [6 points (Written)] Finally, write out the loss function (0), the NLL of the distribution, as a function of 0. Then, calculate the Hessian of the loss w.r.t , and show that it is always PSD. This concludes the proof that NLL loss of GLM is convex. Hint 1: Use the chain rule of calculus along with the results of the previous parts to simplify your derivations. Hint 2: Recall that variance of any probability distribution is nonnegative. Remark: The main takeaways from this problem are: . Any GLM model is convex in its model parameters. . The exponential family of probability distributions are mathematically nice. Whereas calculating mean and variance of distributions in general involves integrals (hard), surprisingly we can calculate them using derivatives (easy) for exponential family.
Expert Answer:
Related Book For
Understandable Statistics Concepts And Methods
ISBN: 9781337119917
12th Edition
Authors: Charles Henry Brase, Corrinne Pellillo Brase
Posted Date:
Students also viewed these programming questions

How do you incorporate selfcare practices into your busy professional life, and do you see any direct impact on your work, especially in roles like tax preparation and notary services?

Planning is one of the most important management functions in any business. A front office managers first step in planning should involve determine the departments goals. Planning also includes...

Develop a twoperiod weighted moving average forecast for periods 12 through 15. Use weights of 0.7 and 0.3, with the most recent observation weighted higher. PERIOD DEMAND 10............ 248...

Recent financial statement data for Harmony Health Foods (HHF) Inc. is shown below. HHF's debttoequity ratio is: A. 0.75. B. 1.13. C. 0.53. D. 1.80. $ 180 Income before interest and taxes 125 36 89...

Discuss and compare three different techniques to physically implement a join.

Use a computerbased logic minimization program to design the instruction decoder for a RISC from Table 103. Create an HDL model of your design and verify its correctness in simulation. Table 103...

The Thiel Company reports the following deferred tax items at the end of 2007: Required Show how the preceding deferred tax items are reported on the Thiel Companys December 31, 2007 balancesheet....

A firm produces at a point equal to the minimum efficient size (MES) and thus: a. It puts up barriers to entry. b. It facilitates the entry of new businesses. c. It raises barriers to entry if it is...

Gary Brandt, treasurer of WorldCom, Inc., could not remember a year quite like the last. WorldCom had stunned the financial community in November 1997 with a $37billion bid for MCI Corp., besting...

Read through the below scenario. Individually, practice using the prewriting skills (e.g. outlin... Flag Read through the below scenario. Individually, practice using the prewriting skills (e.g....

State the number of significant digits in each of the following: (a) 0.5 cm (b) 0.50 g (c) 1.00 x 10 1 mL (d) 1.000 x 10 3 s.

Express 5.55 x 10 2 as an ordinary number. (a) 0.055 5 (b) 0.005 55 (c) 0.000 555 (d) 5,550 (e) None of the above.

Express 55,500,000,000,000,000,000,000 in scientific notation. (a) 5.55 x 10 22 (b) 5.55 x 10 23 (c) 5.55 x 10 22 (d) 5.55 x 10 23 (e) None of the above.

A Honda Accord hybrid gets 19 kilometers per liter city driving. What is the mileage in miles per gallon? (Given: 1 km = 0.621 mi and 1 L = 0.264 gal).

It started with cupcakes. As a teacher at Oak Park Elementary, Caroline would often make cupcakes for the schools teachers and staff. Everyone raved about her baking expertise, and a colleague asked...

Sherriane Baby Products' salaries expense was $14.2 million. Required: What is the amount of cash Sherriane paid to employees during the reporting period if its salaries payable increased by $5.2...

In July 2013, cnet.com listed the battery life (in hours) and luminous intensity (i. e., screen brightness, in cd/m2) for a sample of tablet computers. We want to know if screen brightness is...

Borrowing money may be necessary for business expansion. However, too much borrowed money can also mean trouble. Are developing countries tending to borrow more? A random sample of 20 developing...

The systolic blood pressure of individuals is thought to be related to both age and weight. For a random sample of 11 men, the following data were obtained: (a) Generate summary statistics, including...

Wild irises are beautiful flowers found throughout the United States, Canada, and northern Europe. This problem concerns the length of the sepal (leaflike part covering the flower) of different...

How far do you agree with the proposition that managers should think in terms of total reward as a means of recruiting, retaining and motivating their staff? Are praise and career development as...

Find the remaining trigonometric ratios. csc = 4/3, 3/2 < < 2

To learn about the different needs of a diverse workforce. Lise is 28 years old. She is a divorced mother of 3 children, aged 3, 5, and 7. She is the department head. She earns $37 000 a year in her...