No More Worries!

image Our orders are delivered strictly on time without delay

Paper Formatting

  • Double or single-spaced
  • 1-inch margin
  • 12 Font Arial or Times New Roman
  • 300 words per page

No Lateness!

image Our orders are delivered strictly on time without delay

Our Guarantees


  • Free Unlimited revisions
  • Guaranteed Privacy
  • Money Return guarantee
  • Plagiarism Free Writing

Calculating Probability

  1. A function P : A ⊂ Ω → R is called a probability law over the sample space Ω if it satisfies
    the following three probability axioms.
    • (Nonnegativity) P(A) ≥ 0, for every event A.
    • (Countable additivity) If A and B are two disjoint events, then the probability of their
    union satisfies
    P(A ∪ B) = P(A) + P(B).
    More generally, for a countable collection of disjoint events A1, A2, … we have
    Ai) = P∞
    i=1 P(Ai).
    • (Normalization) The probability of the entire sample space is 1, that is, P(Ω) = 1.
    (a) (5 pts) Prove, using only the axioms of probability given, that P(A) = 1 − P(Ac
    ) for
    any event A and probability law P where Ac denotes the complement of A.
    (b) (5 pts) Let E1, E2, .., En be disjoint sets such that Sn
    Ei = Ω and let P be a probability
    law over the sample space Ω. Show that, for any event A we have
    P(A) = Pn
    i=1 P(A ∩ Ei).
    (c) (5 pts) Prove that for any two events A, B we have
    P(A ∩ B) ≥ P(A) + P(B) − 1.
  2. (10 pts) Two fair dice are thrown. Let
    X =
    1, if the sum of the numbers ≤ 5
    0, otherwise
    Y =
    1, if the product of the numbers is odd
    0, otherwise
    What is Cov (X, Y )? Show your steps clearly.
  3. (10 pts) Derive the mean of Poisson distribution.
  4. In this problem, we will explore certain properties of probability distributions and introduce
    new important concepts.
    (a) (5 pts) Recall Pascal’s Identity for combinations: N



Use the identity to show the following
(1 + x)
N =
m=0 N

· x
which is called the binomial theorem. Hint: You can use induction.
Finally, show that the binomial distribution with parameter p is normalized, that is
m=0 N

· p
m · (1 − p)
(N−m) = 1
(b) (5 pts) Suppose you wish to transmit the value of a random variable to a receiver. In
Information Theory, the average amount of information you will transmit in the process
(in units of “nat”) is obtained by taking the expectation of ln p(x) with respect to the
distribution p(x) of your random variable and is given by
H(x) = −
p(x) · ln p(x) · dx
This quantity is the entropy of your random variable. Calculate and compare the entropies of a uniform random variable x ∼ U(0, 1) and a Gaussian random variable
z ∼ N (0, 1).
(c) In many applications, e.g. in Machine Learning, we wish to approximate some probability distribution using function approximators we have available, for example deep
neural networks. This creates the need for a way to measure the similarity or the distance between two distributions. One proposed such measure is the relative entropy
or the Kullback-Leibler divergence. Given two probability distributions p and q the
KL-divergence between them is given by
KL(p||q) = R ∞
−∞ p(x) · ln
· dx
i. (2 pts) Show that the KL-divergence between equal distributions is zero.
ii. (2 pts) Show that the KL-divergence is not symmetric, that is KL(p||q) 6= KL(q||p)
in general. You can do this by providing an example.
iii. (16 pts) Calculate the KL divergence between p (x) ∼ N
µ1, σ2

and q (x) ∼

µ2, σ2

for µ1 = 2, µ2 = 1.8, σ2
1 = 1.5, σ2
2 = 0.2. First, derive a closed form
solution depending on µ1, µ2, σ1, σ2. Then, calculate its value. (Only numerical
answer without clearly showing your steps will not be graded.)
Remark: We call this measure a divergence since a proper distance function must be

  1. In this problem, we will explore some properties of random variables and in particular that
    of the Gaussian random variable.
    (a) (7 pts) The convolution of two functions f and g is defined as
    (f ∗ g)(t) = R ∞
    −∞ f(τ )g(t − τ )dτ
    One can calculate the probability density function of the random variable Z = X +Y using convolution operation with X and Y independent and continuous random variables.
    In fact,
    fZ(z) = R ∞
    −∞ fX(τ )fY (z − τ
    Using this fact, find the probability density function of Z = X + Y , where X and Y
    are independent standard Gaussian random variables. Find µZ, σZ. Which distribution
    does Z belong to? (Hint: use √
    π =
    R ∞
    −∞ e
    (b) (5 pts) Let X be a standard normal Gaussian random variable and Y be a discrete
    random variable taking values {−1, 1} with equal probabilities. Is the random variable
    Z = XY independent of Y ? Give a formal argument(proof or counter example) justifying
    your answer.
    (c) (8 pts) Let X be a non-negative random variable. Let k be a positive real number.
    Define the binary random variable Y = 0 for X < k and Y = k for X ≥ k. Using the
    relation between X and Y , prove that P(X ≥ k) ≤
    . (Hint: start with finding
    E[Y ]).
  2. In this problem, we will empirically observe some of the results we obtained above and also
    the convergence properties of certain distributions. You may use the python libraries Numpy
    and Matplotlib.
    (a) (5 pts) In 3.a you have found the distribution of Z = X+ Y. Let X and Y be Gaussian
    random variables with µX = −1, µY = 3, and σ
    X = 1 and σ
    Y = 4. Sample 100000
    pairs of X and Y and plot their sum Z = X+Y as a histogram. Is the shape of Z and its
    apparent mean consistent with what you have learned in the lectures?
    (b) (5 pts) Let X B(n, p) be a binomially distributed random variable. One can use the normal distribution as an approximation to the binomial distribution when n is large and/or
    p is close to 0.5. In this case, X ≈ N(np, np(1 − p)). Show how such approximation behaves by drawing 10000 samples from binomial distribution with n = 5, 10, 20, 30, 40, 100
    and p = 0.2, p = 0.33, 0.50 and plotting the distributions of samples for each case as a
    histogram. Report for which values of n and p the distribution resembles that of a
    (c) (5 pts) You were introduced to the concept of KL-divergence analytically. Now, you will
    estimate this divergence KL(p||q). Where p(x) = N (0, 1) and q(x) = N (0, 4). Sample
    1000 samples from a Gaussian with mean 0 and variance 1. Call them x1, x2, …, x1000.
    Estimate the KL divergence as
    i=1 ln
    where p(x) = N (0, 1) and q(x) = N (0, 4). Calculate the divergence analytically for
    KL(p||q). Is the result consistent with your estimate?

Sample Solution

re are several interpretations which explore the main reason for the end of the Cold War such as the Afghanistan War, Reagan’s Presidency, Gorbachev’s leadership, the economy and the independence of Eastern European countries. The main factor that led to the end of the Cold War was the debilitated relationship of the Soviet Union with Eastern European countries which meant that countries such as Poland and Hungary gained independence. As Levesqué argues, the independence of Eastern European countries led to the end of the breakdown of the Soviet Union, ultimately ending the Cold War because of the lack of focus on the East and the increased focus on the West. Moreover, the Soviet Union could not maintain their power and control over the Eastern European countries and could not provide financial aid when requested by Eastern European leaders. Thus, they saw Soviet control and support as inadequate. Although Oberdorfer sees Gorbachev’s leadership as the most important reason for the end of the Cold War, it is not true because the gaining of independence was the most detrimental factor which completely dissolved the Soviet Union, hence why the gaining of independence of Eastern European countries was the most impactful factor that led to the end of the Cold War. Levesqué: Levesqué believes the main reason for the end of the Cold War was the lack of control Gorbachev had over the Eastern European countries. Ultimately, this led to the end of the Cold War because the countries broke away from the Soviet control, which further led to the rapid downfall of the Soviets. Levesqué argues Gorbachev tried to have “the best of both worlds” by having “change and relative stability” in the Eastern European countries. Gorbachev was too focused on the West, disregarding the Eastern European countries which led to their independence because “first priority was given to the East-West rapprochement”. Therefore, the Eastern European countries were a significant reason for the end of the Cold War because the Soviet Union lost control over them as their power was minimised. Additionally, Levesqué depicts how historians in the past thought that Soviet Union leaders had “very poor information on the situation in Eastern Europe”. His argument is based on newly released documents, such as the report from the Bogomolov Institute, which clearly reveal problems at the time – they were just not acted upon. Eastern countries e.g Bratislava were looking to become independent because they disliked the Russian control, but this desire for independence was negative since it meant that the Soviet Union had less control over reforming them. Gorbachev wanted the leaders themselves to implement the changes, supporting the idea of freedom and democracy, but this ultimately led to the Cold War’s end as many were hesitant and refused to implement changes. “Gorbachev was convinced that reform could work in Eastern Europe, but he believed that the initiative had to come from the top leadership of these countries”, supports Oberdorfer’s central argument of his leadership being the main reason of the Co

Price Calculator

Single spaced
approx 275 words per page
Total Cost:

12% Discount


  • Research Paper Writing
  • Essay Writing
  • Dissertation Writing
  • Thesis Writing

Why Choose Us


  • Money Return guarantee
  • Guaranteed Privacy
  • Written by Professionals
  • Paper Written from Scratch
  • Timely Deliveries
  • Free Amendments