<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0">
<channel>
<title>Algorithms Q&amp;A - Recent questions in MDP</title>
<link>https://notexponential.com/questions/artificial-intelligence/markov-decision-processes</link>
<description>Powered by Question2Answer</description>
<item>
<title>Why is it useful to compute the marginal distribution instead of working with the full joint distribution?</title>
<link>https://notexponential.com/996/compute-marginal-distribution-instead-working-distribution</link>
<description>&lt;p&gt;Suppose we model weather using two variables:&lt;/p&gt;&lt;ul&gt;&lt;li&gt;X: Weather condition (Sunny, Cloudy, Rainy)&lt;/li&gt;&lt;li&gt;Y: Whether people carry an umbrella (Yes, No)&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;You are given the &lt;strong&gt;joint distribution&lt;/strong&gt;:&lt;/p&gt;&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;&lt;/th&gt;&lt;th&gt;Umbrella (Yes)&lt;/th&gt;&lt;th&gt;Umbrella (No)&lt;/th&gt;&lt;th&gt;Total&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Sunny&lt;/td&gt;&lt;td&gt;0.05&lt;/td&gt;&lt;td&gt;0.35&lt;/td&gt;&lt;td&gt;0.40&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Cloudy&lt;/td&gt;&lt;td&gt;0.15&lt;/td&gt;&lt;td&gt;0.15&lt;/td&gt;&lt;td&gt;0.30&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Rainy&lt;/td&gt;&lt;td&gt;0.25&lt;/td&gt;&lt;td&gt;0.05&lt;/td&gt;&lt;td&gt;0.30&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;&lt;td&gt;0.45&lt;/td&gt;&lt;td&gt;0.55&lt;/td&gt;&lt;td&gt;1.00&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;hr&gt;&lt;h3&gt;&lt;strong&gt;Core Question&lt;/strong&gt;&lt;/h3&gt;&lt;p&gt;&lt;strong&gt;If you are only interested in predicting whether people carry umbrellas, regardless of the actual weather:&amp;nbsp;&lt;/strong&gt;&lt;strong&gt;Why is it useful to compute the marginal distribution of umbrella usage instead of working with the full joint distribution? What do you gain and what do you lose by marginalizing out the weather variable?&lt;/strong&gt;&lt;/p&gt;&lt;h3&gt;&lt;strong&gt;Some Discussion Points&lt;/strong&gt;&lt;/h3&gt;&lt;p&gt;&lt;/p&gt;&lt;ol&gt;&lt;li&gt;&lt;strong&gt;Relevance to Modeling&lt;/strong&gt;&lt;ul&gt;&lt;li&gt;In what situations would an AI system care only about P(Y) rather than P(X,Y)?&lt;/li&gt;&lt;li&gt;Is marginalization equivalent to “ignoring causes” in this context?&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li&gt;&lt;strong&gt;Hidden Variables / Latent State&lt;/strong&gt;&lt;ul&gt;&lt;li&gt;If weather were &lt;em&gt;unobserved&lt;/em&gt;, how would marginalization help in modeling behavior?&lt;/li&gt;&lt;li&gt;How does this relate to hidden state models (e.g., HMMs in AI)?&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li&gt;&lt;strong&gt;Decision-Making&lt;/strong&gt;&lt;ul&gt;&lt;li&gt;Suppose a city planner wants to estimate umbrella demand.&lt;/li&gt;&lt;li&gt;Is P(Y)&amp;nbsp;sufficient, or do they need P(Y/X)?&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li&gt;&lt;strong&gt;Information Loss&lt;/strong&gt;&lt;ul&gt;&lt;li&gt;What important causal relationship disappears when we marginalize out weather?&lt;/li&gt;&lt;li&gt;Could two very different weather patterns produce the same marginal umbrella usage?&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ol&gt;</description>
<category>MDP</category>
<guid isPermaLink="true">https://notexponential.com/996/compute-marginal-distribution-instead-working-distribution</guid>
<pubDate>Tue, 07 Apr 2026 22:16:23 +0000</pubDate>
</item>
<item>
<title>Evaluate an MDP given several observed episodes</title>
<link>https://notexponential.com/821/evaluate-an-mdp-given-several-observed-episodes</link>
<description>Let&amp;#039;s say you are given a &amp;#039;+&amp;#039;-shaped MDP with five states and a gamma (discount rate) of 1:&lt;br /&gt;
&lt;br /&gt;
Given MDP&lt;br /&gt;
&lt;br /&gt;
_ A _&lt;br /&gt;
B C D&lt;br /&gt;
_ E _&lt;br /&gt;
&lt;br /&gt;
The input policy \Pi is as follows:&lt;br /&gt;
&lt;br /&gt;
A -&amp;gt; Terminal&lt;br /&gt;
B -&amp;gt; C&lt;br /&gt;
C -&amp;gt; D&lt;br /&gt;
D -&amp;gt; Terminal&lt;br /&gt;
E -&amp;gt; C&lt;br /&gt;
&lt;br /&gt;
Let&amp;#039;s say you have the following observed episodes (training) though:&lt;br /&gt;
&lt;br /&gt;
Episode 1:&lt;br /&gt;
B, east, C, -1&lt;br /&gt;
C, east, D, -1&lt;br /&gt;
D, exit, x, +10&lt;br /&gt;
&lt;br /&gt;
Episode 2:&lt;br /&gt;
B, east, C, -1&lt;br /&gt;
C, east, D, -1&lt;br /&gt;
D, exit, x, +10&lt;br /&gt;
&lt;br /&gt;
Episode 3:&lt;br /&gt;
E, north, C, -1&lt;br /&gt;
C, east, D, -1&lt;br /&gt;
D, exit, x, +10&lt;br /&gt;
&lt;br /&gt;
Episode 4:&lt;br /&gt;
E, north, C, -1&lt;br /&gt;
C, north, A, -1&lt;br /&gt;
A, exit, x, -10&lt;br /&gt;
&lt;br /&gt;
What are the output values based on these episodes?</description>
<category>MDP</category>
<guid isPermaLink="true">https://notexponential.com/821/evaluate-an-mdp-given-several-observed-episodes</guid>
<pubDate>Thu, 11 May 2023 12:58:02 +0000</pubDate>
</item>
<item>
<title>Solve the V* values for this MDP - 5x5</title>
<link>https://notexponential.com/758/solve-the-v-values-for-this-mdp-5x5</link>
<description>&lt;table cellpadding=&quot;2&quot; border=&quot;0&quot; style=&quot;width:100%; border-spacing: 0px;&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td rowspan=&quot;2&quot; style=&quot;text-align:center !important; vertical-align:middle&quot;&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot; style=&quot;vertical-align:top; width:100%&quot;&gt;&lt;p&gt;Estimate the final V* values for a given grid world.&amp;nbsp;&lt;/p&gt;&lt;ul&gt;&lt;li style=&quot;list-style-type:inherit&quot;&gt;Discount (gamma) is 0.9.&lt;/li&gt;&lt;li style=&quot;list-style-type:inherit&quot;&gt;Noise (probability is 0.8, 0.1, 0.1.&amp;nbsp; That is, when moving in a direction, there is 80% probablity of going in that direction and 10% probability of going in one of two perpendicular directions.&amp;nbsp;&amp;nbsp;&lt;/li&gt;&lt;li style=&quot;list-style-type:inherit&quot;&gt;Terminal states are given in the table.&amp;nbsp; All non terminal states are left blank and need to be solved.&lt;/li&gt;&lt;li style=&quot;list-style-type:inherit&quot;&gt;Living reward (R(s,a,s&#039;)) is 0.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;Instead of iteration, use the solving approach.&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;table cellpadding=&quot;0&quot; border=&quot;1&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td style=&quot;height:0.5in; width:0.5in&quot;&gt;&lt;p&gt;1&lt;/p&gt;&lt;/td&gt;&lt;td style=&quot;height:0.5in; width:0.5in&quot;&gt;&lt;p&gt;&lt;/p&gt;&lt;/td&gt;&lt;td style=&quot;height:0.5in; width:0.5in&quot;&gt;&lt;p&gt;0&lt;/p&gt;&lt;/td&gt;&lt;td style=&quot;height:0.5in; width:0.5in&quot;&gt;&lt;p&gt;&lt;/p&gt;&lt;/td&gt;&lt;td style=&quot;height:0.5in; width:0.5in&quot;&gt;&lt;p&gt;1&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style=&quot;height:0.5in; width:0.5in&quot;&gt;&lt;p&gt;&lt;/p&gt;&lt;/td&gt;&lt;td style=&quot;height:0.5in; width:0.5in&quot;&gt;&lt;p&gt;&lt;/p&gt;&lt;/td&gt;&lt;td style=&quot;height:0.5in; width:0.5in&quot;&gt;&lt;p&gt;&lt;/p&gt;&lt;/td&gt;&lt;td style=&quot;height:0.5in; width:0.5in&quot;&gt;&lt;p&gt;&lt;/p&gt;&lt;/td&gt;&lt;td style=&quot;height:0.5in; width:0.5in&quot;&gt;&lt;p&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style=&quot;height:0.5in; width:0.5in&quot;&gt;&lt;p&gt;0&lt;/p&gt;&lt;/td&gt;&lt;td style=&quot;height:0.5in; width:0.5in&quot;&gt;&lt;p&gt;&lt;/p&gt;&lt;/td&gt;&lt;td style=&quot;height:0.5in; width:0.5in&quot;&gt;&lt;p&gt;&lt;/p&gt;&lt;/td&gt;&lt;td style=&quot;height:0.5in; width:0.5in&quot;&gt;&lt;p&gt;&lt;/p&gt;&lt;/td&gt;&lt;td style=&quot;height:0.5in; width:0.5in&quot;&gt;&lt;p&gt;0&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style=&quot;height:0.5in; width:0.5in&quot;&gt;&lt;p&gt;&lt;/p&gt;&lt;/td&gt;&lt;td style=&quot;height:0.5in; width:0.5in&quot;&gt;&lt;p&gt;&lt;/p&gt;&lt;/td&gt;&lt;td style=&quot;height:0.5in; width:0.5in&quot;&gt;&lt;p&gt;&lt;/p&gt;&lt;/td&gt;&lt;td style=&quot;height:0.5in; width:0.5in&quot;&gt;&lt;p&gt;&lt;/p&gt;&lt;/td&gt;&lt;td style=&quot;height:0.5in; width:0.5in&quot;&gt;&lt;p&gt;&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td style=&quot;height:0.5in; width:0.5in&quot;&gt;&lt;p&gt;1&lt;/p&gt;&lt;/td&gt;&lt;td style=&quot;height:0.5in; width:0.5in&quot;&gt;&lt;p&gt;&lt;/p&gt;&lt;/td&gt;&lt;td style=&quot;height:0.5in; width:0.5in&quot;&gt;&lt;p&gt;0&lt;/p&gt;&lt;/td&gt;&lt;td style=&quot;height:0.5in; width:0.5in&quot;&gt;&lt;p&gt;&lt;/p&gt;&lt;/td&gt;&lt;td style=&quot;height:0.5in; width:0.5in&quot;&gt;&lt;p&gt;1&lt;/p&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;</description>
<category>MDP</category>
<guid isPermaLink="true">https://notexponential.com/758/solve-the-v-values-for-this-mdp-5x5</guid>
<pubDate>Tue, 30 Mar 2021 14:38:01 +0000</pubDate>
</item>
<item>
<title>Jumpy Car - Estimate/Solve the V* values for the following MDP</title>
<link>https://notexponential.com/755/jumpy-car-estimate-solve-the-v-values-for-the-following-mdp</link>
<description>&lt;p&gt;You are traveling on a straight road, but have a jumpy car.&amp;nbsp; The car sometimes &quot;jumps&quot; (moves double).&amp;nbsp; At other times, it doesn&#039;t move at all.&amp;nbsp; The following MDP has been created to model this behavior and the landscape.&lt;/p&gt;&lt;p&gt;Estimate the V* values (optimal values for the states) for this MDP:&lt;/p&gt;&lt;table border=&quot;1&quot; cellpadding=&quot;1&quot; style=&quot;width:500px&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;S1&lt;/td&gt;&lt;td&gt;S2&lt;/td&gt;&lt;td&gt;S3&lt;/td&gt;&lt;td&gt;S4&lt;/td&gt;&lt;td&gt;S5&lt;/td&gt;&lt;td&gt;S6&lt;/td&gt;&lt;td&gt;S7&lt;/td&gt;&lt;td&gt;S8&lt;/td&gt;&lt;td&gt;S9&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;sqrt(3)&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;100&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;&lt;/td&gt;&lt;td&gt;sqrt(3)&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;p&gt;MDP is defined as follows: There are two actions L (Left) and R (Right).&amp;nbsp; When moving left, there is a 40% chance moving left, 50% chance of moving DOUBLE (2 spots left), and 10% chance of not moving at all.&amp;nbsp; Similarly, when moving right, there is a 40% chance moving right, 50% chance of moving DOUBLE (2 spots right), and 10% chance of not moving at all.&lt;/p&gt;&lt;p&gt;S1, S5 and S9&amp;nbsp;are terminal states with values sqrt(3), 100 and sqrt(3) respectively.&amp;nbsp;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Discount factor is 0.9&amp;nbsp;(so, gamma = 0.9).&lt;/p&gt;&lt;p&gt;There is no living reward, that is R(s,a,s’) = 0.&lt;/p&gt;</description>
<category>MDP</category>
<guid isPermaLink="true">https://notexponential.com/755/jumpy-car-estimate-solve-the-v-values-for-the-following-mdp</guid>
<pubDate>Mon, 01 Mar 2021 23:11:54 +0000</pubDate>
</item>
<item>
<title>Solve the V* values for this MDP</title>
<link>https://notexponential.com/754/solve-the-v-values-for-this-mdp</link>
<description>&lt;p&gt;For the given MDP, find the values for the states S2, S3 and S4. States S1 and S5 are terminal states with values 0 and 1 respectively. Living Reward (R) is 0. Transition function is defined as follows: When going Left or Right, there is a 90% probability that the move goes as planned and 10% probability that no move occurs. Discount rate gamma is 0.9&lt;/p&gt;&lt;table border=&quot;1&quot; cellpadding=&quot;1&quot; style=&quot;width:500px&quot;&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;S1&lt;/td&gt;&lt;td&gt;S2&lt;/td&gt;&lt;td&gt;S3&lt;/td&gt;&lt;td&gt;S4&lt;/td&gt;&lt;td&gt;S5&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;0&lt;/td&gt;&lt;td&gt;x2&lt;/td&gt;&lt;td&gt;x3&lt;/td&gt;&lt;td&gt;x4&lt;/td&gt;&lt;td&gt;1&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;&lt;p&gt;&lt;/p&gt;</description>
<category>MDP</category>
<guid isPermaLink="true">https://notexponential.com/754/solve-the-v-values-for-this-mdp</guid>
<pubDate>Tue, 23 Feb 2021 22:40:41 +0000</pubDate>
</item>
</channel>
</rss>