Algorithms Q&A - Recent questions in MDP

Why is it useful to compute the marginal distribution instead of working with the full joint distribution?

Tue, 07 Apr 2026 22:16:23 +0000

Suppose we model weather using two variables:

X: Weather condition (Sunny, Cloudy, Rainy)
Y: Whether people carry an umbrella (Yes, No)

You are given the joint distribution:

	Umbrella (Yes)	Umbrella (No)	Total
Sunny	0.05	0.35	0.40
Cloudy	0.15	0.15	0.30
Rainy	0.25	0.05	0.30
Total	0.45	0.55	1.00

Core Question

If you are only interested in predicting whether people carry umbrellas, regardless of the actual weather: Why is it useful to compute the marginal distribution of umbrella usage instead of working with the full joint distribution? What do you gain and what do you lose by marginalizing out the weather variable?

Some Discussion Points

Relevance to Modeling
- In what situations would an AI system care only about P(Y) rather than P(X,Y)?
- Is marginalization equivalent to “ignoring causes” in this context?
Hidden Variables / Latent State
- If weather were unobserved, how would marginalization help in modeling behavior?
- How does this relate to hidden state models (e.g., HMMs in AI)?
Decision-Making
- Suppose a city planner wants to estimate umbrella demand.
- Is P(Y) sufficient, or do they need P(Y/X)?
Information Loss
- What important causal relationship disappears when we marginalize out weather?
- Could two very different weather patterns produce the same marginal umbrella usage?

Evaluate an MDP given several observed episodes

Thu, 11 May 2023 12:58:02 +0000

Let's say you are given a '+'-shaped MDP with five states and a gamma (discount rate) of 1:

Given MDP

_ A _
B C D
_ E _

The input policy \Pi is as follows:

A -> Terminal
B -> C
C -> D
D -> Terminal
E -> C

Let's say you have the following observed episodes (training) though:

Episode 1:
B, east, C, -1
C, east, D, -1
D, exit, x, +10

Episode 2:
B, east, C, -1
C, east, D, -1
D, exit, x, +10

Episode 3:
E, north, C, -1
C, east, D, -1
D, exit, x, +10

Episode 4:
E, north, C, -1
C, north, A, -1
A, exit, x, -10

What are the output values based on these episodes?

Solve the V* values for this MDP - 5x5

Tue, 30 Mar 2021 14:38:01 +0000

Estimate the final V* values for a given grid world.

Discount (gamma) is 0.9.
Noise (probability is 0.8, 0.1, 0.1. That is, when moving in a direction, there is 80% probablity of going in that direction and 10% probability of going in one of two perpendicular directions.
Terminal states are given in the table. All non terminal states are left blank and need to be solved.
Living reward (R(s,a,s')) is 0.

Instead of iteration, use the solving approach.

1	0	1

0		0

1	0	1

Jumpy Car - Estimate/Solve the V* values for the following MDP

Mon, 01 Mar 2021 23:11:54 +0000

You are traveling on a straight road, but have a jumpy car. The car sometimes "jumps" (moves double). At other times, it doesn't move at all. The following MDP has been created to model this behavior and the landscape.

Estimate the V* values (optimal values for the states) for this MDP:

S1	S2	S3	S4	S5	S6	S7	S8	S9
sqrt(3)				100				sqrt(3)

MDP is defined as follows: There are two actions L (Left) and R (Right). When moving left, there is a 40% chance moving left, 50% chance of moving DOUBLE (2 spots left), and 10% chance of not moving at all. Similarly, when moving right, there is a 40% chance moving right, 50% chance of moving DOUBLE (2 spots right), and 10% chance of not moving at all.

S1, S5 and S9 are terminal states with values sqrt(3), 100 and sqrt(3) respectively.

Discount factor is 0.9 (so, gamma = 0.9).

There is no living reward, that is R(s,a,s’) = 0.

Solve the V* values for this MDP

Tue, 23 Feb 2021 22:40:41 +0000

For the given MDP, find the values for the states S2, S3 and S4. States S1 and S5 are terminal states with values 0 and 1 respectively. Living Reward (R) is 0. Transition function is defined as follows: When going Left or Right, there is a 90% probability that the move goes as planned and 10% probability that no move occurs. Discount rate gamma is 0.9

S1	S2	S3	S4	S5
0	x2	x3	x4	1