What is the “Hotel Problem” and how does it affect the interpretation of data?
This can be particularly confusing when evaluating new versus returning users. For example, Localytics counts a user as new only the first time they open your app. Therefore, a user who opens your app for the first time on a given day and returns again the same day will be counted once as a new user and once as a returning user. This means that if you split the number of users from a given day between new and returning, the total will likely exceed the number of unique users from that day (because some users are properly identified as both new and returning).
A variation of this problem manifests when counting users who have enabled or disabled push messaging. Over time, a user can enable or disable your ability to send them push messages in your app. Since our SDK tracks whether a user has push enabled on every event, they can show as being both enabled and disabled if the report time period covers when they switched. A discrepancy can also occur in the number of users push messages are sent to due to known versus anonymous users. For example, if a user enables push messaging while anonymous and later disables push messaging while known, the user will technically be counted as two different users with different push enabled settings.
Let's say you're looking at a Localytics report covering the month of April. A user who had push disabled on April 1 decided to enable it on April 15. If you filter the April 30-day report by Push Enabled, that user would be counted for April. If you remove that filter and instead filter by Push Disabled, they would again be counted for April, since that user was both enabled and disabled over that time period.
The Hotel Problem will also create a discrepancy between unique users and total users over time. For example, when looking at Users by Day, the dashboard is displaying the number of unique users for that specific day. However, the summary bar at the top of the screen shows unique users across the entire time period, and users will inevitably overlap as they have sessions on different days. So, the Users by Day Total column is the sum of users each day instead of the desired time range—a perfect example of the "Hotel Problem."