<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>Algorithms Q&amp;A - Recent questions and answers in MDP</title>
<link>https://notexponential.com/qa/artificial-intelligence/markov-decision-processes</link>
<description>Powered by Question2Answer</description>
<item>
<title>Answered: Jumpy Car - Estimate/Solve the V* values for the following MDP</title>
<link>https://notexponential.com/755/jumpy-car-estimate-solve-the-v-values-for-the-following-mdp?show=1077#a1077</link>
<description>&lt;p&gt;&lt;/p&gt;&lt;p class=&quot;p3&quot;&gt;&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;Hello Professor,&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;Since S5 has a terminal value of 100, the optimal strategy is to move toward S5. So for the states on the left side, S2, S3, and S4, the best action is R. For the states on the right side, S6, S7, and S8, the best action is L.&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;The terminal state values are:&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;V(S1) = 1.732&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;V(S5) = 100&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;V(S9) = 1.732&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;Because the MDP is symmetric around S5:&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;V(S2) = V(S8)&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;V(S3) = V(S7)&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;V(S4) = V(S6)&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;Let:&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;x = V(S2) = V(S8)&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;y = V(S3) = V(S7)&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;z = V(S4) = V(S6)&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;The discount factor is 0.9. There is no living reward, so the value comes only from discounted future values.&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;For S4, moving right gives:&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;z = 0.9(0.4(100) + 0.5z + 0.1z)&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;z = 0.9(40 + 0.6z)&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;z = 36 + 0.54z&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;0.46z = 36&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;z = 78.26&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;So:&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;V(S4) = V(S6) = 78.26&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;For S3, moving right gives:&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;y = 0.9(0.4z + 0.5(100) + 0.1y)&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;Substituting z = 78.26:&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;y = 0.9(0.4(78.26) + 50 + 0.1y)&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;y = 73.17 + 0.09y&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;0.91y = 73.17&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;y = 80.41&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;So:&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;V(S3) = V(S7) = 80.41&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;For S2, moving right gives:&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;x = 0.9(0.4y + 0.5z + 0.1x)&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;Substituting y = 80.41 and z = 78.26:&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;x = 0.9(0.4(80.41) + 0.5(78.26) + 0.1x)&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;x = 64.16 + 0.09x&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;0.91x = 64.16&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;x = 70.51&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;So:&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;V(S2) = V(S8) = 70.51&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;Therefore, the final values are:&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;V(S1) = 1.732&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;V(S2) = 70.51&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;V(S3) = 80.41&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;V(S4) = 78.26&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;V(S5) = 100&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;V(S6) = 78.26&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;V(S7) = 80.41&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;V(S8) = 70.51&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;V(S9) = 1.732&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;Final answer:&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;[1.732, 70.51, 80.41, 78.26, 100, 78.26, 80.41, 70.51, 1.732]&lt;/p&gt;&lt;p class=&quot;p1&quot;&gt;The values are not perfectly increasing as the states get closer to S5 because there is a chance of jumping two spaces. This means that being one step away from S5 is not always better than being two steps away, depending on the transition probabilities&lt;/p&gt;</description>
<category>MDP</category>
<guid isPermaLink="true">https://notexponential.com/755/jumpy-car-estimate-solve-the-v-values-for-the-following-mdp?show=1077#a1077</guid>
<pubDate>Sun, 26 Apr 2026 13:09:30 +0000</pubDate>
</item>
<item>
<title>Answered: Why is it useful to compute the marginal distribution instead of working with the full joint distribution?</title>
<link>https://notexponential.com/996/compute-marginal-distribution-instead-working-distribution?show=1030#a1030</link>
<description>Hello,&lt;br /&gt;
&lt;br /&gt;
Here is my solution according to the given:&lt;br /&gt;
&lt;br /&gt;
We are interested only in Y = umbrella usage, not in X = weather condition.&lt;br /&gt;
&lt;br /&gt;
So instead of using the full joint distribution P(X,Y), it is useful to compute the marginal distribution:&lt;br /&gt;
&lt;br /&gt;
P(Y) = sum over all weather conditions of P(X,Y)&lt;br /&gt;
&lt;br /&gt;
From the table:&lt;br /&gt;
&lt;br /&gt;
P(Umbrella = Yes) = 0.05 + 0.15 + 0.25 = 0.45&lt;br /&gt;
&lt;br /&gt;
P(Umbrella = No) = 0.35 + 0.15 + 0.05 = 0.55&lt;br /&gt;
&lt;br /&gt;
So the marginal distribution of umbrella usage is:&lt;br /&gt;
&lt;br /&gt;
P(Y) =&lt;br /&gt;
&lt;br /&gt;
Yes -&amp;gt; 0.45&lt;br /&gt;
&lt;br /&gt;
No -&amp;gt; 0.55&lt;br /&gt;
&lt;br /&gt;
Why this is useful:&lt;br /&gt;
&lt;br /&gt;
If the only goal is to predict whether people carry umbrellas overall, then P(Y) is enough.&lt;br /&gt;
&lt;br /&gt;
It gives a direct answer to the question of interest without keeping extra information about weather.&lt;br /&gt;
&lt;br /&gt;
So the gain is:&lt;br /&gt;
&lt;br /&gt;
Simpler model&lt;br /&gt;
&lt;br /&gt;
Less computation&lt;br /&gt;
&lt;br /&gt;
Easier prediction if weather is unknown or irrelevant&lt;br /&gt;
&lt;br /&gt;
Direct estimate of total umbrella usage in the population&lt;br /&gt;
&lt;br /&gt;
This is useful in AI when the system only cares about the final behavior, not the reason behind it.&lt;br /&gt;
&lt;br /&gt;
For example:&lt;br /&gt;
&lt;br /&gt;
Estimating total umbrella demand in shops&lt;br /&gt;
&lt;br /&gt;
Planning inventory&lt;br /&gt;
&lt;br /&gt;
Predicting how many umbrellas may be seen in public places&lt;br /&gt;
&lt;br /&gt;
Any case where only the final action matters&lt;br /&gt;
&lt;br /&gt;
What we lose by marginalizing:&lt;br /&gt;
&lt;br /&gt;
When we marginalize out weather, we remove the connection between weather and umbrella usage.&lt;br /&gt;
&lt;br /&gt;
So we lose the causal or explanatory relationship.&lt;br /&gt;
&lt;br /&gt;
From the joint table we can see:&lt;br /&gt;
&lt;br /&gt;
On sunny days, umbrella use is very low&lt;br /&gt;
&lt;br /&gt;
On rainy days, umbrella use is very high&lt;br /&gt;
&lt;br /&gt;
But after marginalization, we only know 45% carry umbrellas overall&lt;br /&gt;
&lt;br /&gt;
We no longer know why&lt;br /&gt;
&lt;br /&gt;
So marginalization is not exactly the same as saying causes do not exist, but it means we choose not to represent them in the model.&lt;br /&gt;
&lt;br /&gt;
The cause may still be there, but it becomes hidden from our final distribution.&lt;br /&gt;
&lt;br /&gt;
If weather is unobserved:&lt;br /&gt;
&lt;br /&gt;
Marginalization helps because we can still model umbrella behavior even when X is missing.&lt;br /&gt;
&lt;br /&gt;
That is useful when weather is a hidden variable.&lt;br /&gt;
&lt;br /&gt;
In this sense, it is related to hidden state models such as HMMs.&lt;br /&gt;
&lt;br /&gt;
In HMM-like thinking:&lt;br /&gt;
&lt;br /&gt;
Weather can be treated as hidden state&lt;br /&gt;
&lt;br /&gt;
Umbrella usage can be treated as observed behavior&lt;br /&gt;
&lt;br /&gt;
Then even if we do not directly observe weather, we can still reason about observed actions through probabilities&lt;br /&gt;
&lt;br /&gt;
Decision-making part:&lt;br /&gt;
&lt;br /&gt;
If a city planner wants only the total average umbrella demand, then P(Y) may be sufficient.&lt;br /&gt;
&lt;br /&gt;
Because it tells the planner that about 45% of people carry umbrellas.&lt;br /&gt;
&lt;br /&gt;
But if the planner wants better forecasting under different conditions, then P(Y) is not enough.&lt;br /&gt;
&lt;br /&gt;
They would need conditional probabilities such as P(Y|X).&lt;br /&gt;
&lt;br /&gt;
Because umbrella demand clearly changes depending on whether it is sunny, cloudy, or rainy.&lt;br /&gt;
&lt;br /&gt;
Important information loss:&lt;br /&gt;
&lt;br /&gt;
The main thing that disappears is the dependence between X and Y.&lt;br /&gt;
&lt;br /&gt;
In other words, we lose the structure of how weather affects umbrella usage.&lt;br /&gt;
&lt;br /&gt;
This is important because two different environments could have the same marginal P(Y), even if their weather patterns are very different.&lt;br /&gt;
&lt;br /&gt;
So yes, two very different weather distributions could produce the same overall umbrella usage.&lt;br /&gt;
&lt;br /&gt;
That means P(Y) alone cannot tell us what is causing the behavior.&lt;br /&gt;
&lt;br /&gt;
So as a final conclusion:&lt;br /&gt;
&lt;br /&gt;
Marginalizing out weather is useful when we only care about predicting umbrella usage overall, because it gives a simpler and more direct distribution:&lt;br /&gt;
&lt;br /&gt;
P(Umbrella = Yes) = 0.45&lt;br /&gt;
&lt;br /&gt;
P(Umbrella = No) = 0.55&lt;br /&gt;
&lt;br /&gt;
What we gain:&lt;br /&gt;
&lt;br /&gt;
Simplicity&lt;br /&gt;
&lt;br /&gt;
Lower complexity&lt;br /&gt;
&lt;br /&gt;
Usable model even when weather is hidden or irrelevant&lt;br /&gt;
&lt;br /&gt;
What we lose:&lt;br /&gt;
&lt;br /&gt;
The relationship between weather and umbrella behavior&lt;br /&gt;
&lt;br /&gt;
Explanatory and causal information&lt;br /&gt;
&lt;br /&gt;
Ability to make condition-specific decisions&lt;br /&gt;
&lt;br /&gt;
So P(Y) is enough for overall prediction, but not enough for deeper understanding or better weather-dependent decision making.</description>
<category>MDP</category>
<guid isPermaLink="true">https://notexponential.com/996/compute-marginal-distribution-instead-working-distribution?show=1030#a1030</guid>
<pubDate>Thu, 16 Apr 2026 07:44:10 +0000</pubDate>
</item>
<item>
<title>Answered: Evaluate an MDP given several observed episodes</title>
<link>https://notexponential.com/821/evaluate-an-mdp-given-several-observed-episodes?show=822#a822</link>
<description>Each state&amp;#039;s value depends on the paths that continue from that state.&lt;br /&gt;
&lt;br /&gt;
For A: &lt;br /&gt;
Episode 4: A -&amp;gt; x (sum = -10)&lt;br /&gt;
A = average( path sums ) = -10&lt;br /&gt;
&lt;br /&gt;
For B:&lt;br /&gt;
Episode 1: B -&amp;gt; C -&amp;gt; D -&amp;gt; x (sum = +8)&lt;br /&gt;
Episode 2: B -&amp;gt; C -&amp;gt; D -&amp;gt; x (sum = +8)&lt;br /&gt;
B = average( path sums ) = (+8 + +8)/ 2 = 8&lt;br /&gt;
&lt;br /&gt;
For C:&lt;br /&gt;
Episode 1: C -&amp;gt; D -&amp;gt; x (sum = +9)&lt;br /&gt;
Episode 2: C -&amp;gt; D -&amp;gt; x (sum = +9)&lt;br /&gt;
Episode 3: C -&amp;gt; D -&amp;gt; x (sum = +9)&lt;br /&gt;
Episode 4: C -&amp;gt; A -&amp;gt; x (sum = -11)&lt;br /&gt;
C = average( path sums ) = (9 + 9 + 9 - 11)/4 = 4&lt;br /&gt;
&lt;br /&gt;
For D:&lt;br /&gt;
Episode 1: D -&amp;gt; x (sum = +10)&lt;br /&gt;
Episode 2: D -&amp;gt; x (sum = +10)&lt;br /&gt;
Episode 3: D -&amp;gt; x (sum = +10)&lt;br /&gt;
D = average( path sums ) = +10&lt;br /&gt;
&lt;br /&gt;
For E:&lt;br /&gt;
Episode 1: E -&amp;gt; C -&amp;gt; D -&amp;gt; x (sum = 8)&lt;br /&gt;
Episode 2: E -&amp;gt; C -&amp;gt; A -&amp;gt; x (sum = -12)&lt;br /&gt;
E = average( path sums ) = (8 - 12)/2 = -2</description>
<category>MDP</category>
<guid isPermaLink="true">https://notexponential.com/821/evaluate-an-mdp-given-several-observed-episodes?show=822#a822</guid>
<pubDate>Thu, 11 May 2023 12:58:28 +0000</pubDate>
</item>
<item>
<title>Answered: Solve the V* values for this MDP</title>
<link>https://notexponential.com/754/solve-the-v-values-for-this-mdp?show=813#a813</link>
<description>x4 = 0 + 0.9(0.9 * 1 + 0.1 * &amp;nbsp;x4) =&amp;gt; 0.91*x4 = 0.81 =&amp;gt; x4 = 81/91 = 0.890109&lt;br /&gt;
&lt;br /&gt;
x3 = 0 + 0.9(0.9 * x4 + 0.1* x3) =&amp;gt; 0.91 * x3 = 0.81 * x4 =&amp;gt; x3 = (81/91)^2 = 0.792295&lt;br /&gt;
&lt;br /&gt;
x2 = 0 + 0.9(0.9 * x3 + 0.1 * x2) =&amp;gt; 0.91 *x2 = 0.81 * x3 =&amp;gt; x2 = (81/91)^3 = 0.7052301</description>
<category>MDP</category>
<guid isPermaLink="true">https://notexponential.com/754/solve-the-v-values-for-this-mdp?show=813#a813</guid>
<pubDate>Mon, 08 May 2023 06:08:38 +0000</pubDate>
</item>
<item>
<title>Answered: Solve the V* values for this MDP - 5x5</title>
<link>https://notexponential.com/758/solve-the-v-values-for-this-mdp-5x5?show=806#a806</link>
<description>At first glace, with its 17 empty spaces, this problem looks long. However, we can take advantage of the symmetry to reduce the number of variables to solve for to four.&lt;br /&gt;
&lt;br /&gt;
01 x1 00 x1 01&lt;br /&gt;
x1 x2 x3 x2 x1&lt;br /&gt;
00 x3 x4 x3 00&lt;br /&gt;
x1 x2 x3 x2 x1&lt;br /&gt;
01 x1 00 x1 01&lt;br /&gt;
&lt;br /&gt;
We can also see the optimal policies for each cell and avoid the need to do multiple calculations and calculate a maximum.&lt;br /&gt;
&lt;br /&gt;
- Spaces with x1 should aim toward the terminal states with 1&lt;br /&gt;
- Spaces with x2 should move toward the x1 spaces&lt;br /&gt;
- Spaces with x3 should move toward x2 (since x4 in the middle is the farthest away from the terminals)&lt;br /&gt;
- Spaces with x4 should move toward x3 (no real choice)&lt;br /&gt;
&lt;br /&gt;
Now, the value given a policy pi and the state s is&lt;br /&gt;
&lt;br /&gt;
V(s) = sum (over all states s) of T(s,pi(s),s&amp;#039;)*[R(s,pi(s),s&amp;#039;) + gamma*V(s&amp;#039;)]&lt;br /&gt;
&lt;br /&gt;
x1 = 0.8(0 + 0.9*1) + 0.1(0 + 0.9*x1) + 0.1(0+ 0.9*x2)&lt;br /&gt;
x1 = 0.72 + 0.09*x1 + 0.09*x2&lt;br /&gt;
&lt;br /&gt;
x2 = 0.8(0 + 0.9*x1) + 0.1(0 + 0.9*x1) + 0.1(0 + 0.9*x3)&lt;br /&gt;
x2 = 0.81*x1 + 0.09*x3&lt;br /&gt;
&lt;br /&gt;
x3 = 0.8(0 + 0.9*x2) + 0.1(0 + 0.9*0) + 0.1(0 + 0.9*x4)&lt;br /&gt;
x3 = 0.72*x2 + 0.09*x4&lt;br /&gt;
&lt;br /&gt;
x4 = 0.8(0 + 0.9*x3) + 0.1(0 + 0.9*x3) + 0.1(0 + 0.9*x3)&lt;br /&gt;
x4 = 0.90*x3</description>
<category>MDP</category>
<guid isPermaLink="true">https://notexponential.com/758/solve-the-v-values-for-this-mdp-5x5?show=806#a806</guid>
<pubDate>Thu, 04 May 2023 02:50:52 +0000</pubDate>
</item>
</channel>
</rss>