Here's how I think about this from a somewhat simplified model of the end-to-end system. We have the input and output devices (mouse/keyboard/controller & monitor), the game engine (takes inputs from...
See more...
Here's how I think about this from a somewhat simplified model of the end-to-end system. We have the input and output devices (mouse/keyboard/controller & monitor), the game engine (takes inputs from the devices, renders output on monitor), and the multiplayer game server. Let's call the links between these "L" (for local I/O to game engine) and "M" (for game engine to multiplayer server).
I/O <-- L --> GE <-- M --> MS
In a typical console or PC scenario, the "L" link is pretty fast (for PC, very fast; for consoles driving a TV, still fast, but not as fast as a PC). Call it a few tens of milliseconds.
But, in that scenario, the "M" link may be quite bad, since it includes the local internet service and one or more internet transit links before getting to the multiplayer server. Call it anywhere from a few tens of milliseconds up to a few hundred milliseconds. In that environment, any movements the player makes may be visible on screen quickly, since they only require a round trip through the "L" link. However, any game state changes that are influenced by other players may take a long time to be visible, since those require a round trip through the "M" link (in fact, it's worse, since they require your "M" link and other players' "M" links to be transited). This results in weird stuff like thinking you got the drop on someone, but then it turns out the multiplayer game server decided they actually saw you first, etc. This results in "rubber banding" where your view of another player may jump from one spot to another as the whole distributed system becomes eventually consistent.
Now, in the Stadia scenario, the "L" link is not as fast as it would be for a PC, since we're going through the local internet service (and more, but that gets really complicated and dependent on your local ISP). It should still be a few tens of milliseconds, but more than a PC (and maybe comparable to a local console of the last generation). Call it 100ms to 150ms to be conservative.
Now, though, the "M" link is going to be very fast, since it's all inside either Google's own network, or at worst transiting the high speed links Google has to all of the major cloud hosting regions. So, this should be in the tens of milliseconds for most players at worst, and may be less than ten milliseconds pretty commonly. So now, you may not see your own movements quite as quickly as you would in a local setup, but the number of times you see things roll back or rubber band should be very small – and as @RXShorty said, people can adapt to high latency pretty well. However, this is assuming all of the people playing have similarly small "M" link latencies. In a cross-play environment, some people may have extremely short "M" latencies, while others (non-Stadia players) may have much longer "M" link latencies. In a mixed environment like this, you will still see some rubber-banding, but being a Stadia player may provide some advantages in terms of how often that happens.
Here's the thing, though. The real model looks like this:
Player 1 <--> I/O1 <-- L1 --> GE1 <-- M1 --> MS <-- M2 --> GE2 <-- L2 --> I/O2 <--> Player 2
So, the end-to-end latency that matters in a lot of cases is really Player 1 <--> Player 2, and while Stadia changes the relative ratios of L/M link latencies, it doesn't actually change the total end-to-end latency by much. If you're playing against someone very far away who has craptastic internet, it's not going to be a fun time for either of you, regardless of which technology stack is underneath.
Oof – this has gotten much longer than I'd planned. I didn't even get into the speed of typical people's photon-to-finger reaction time or signal propagation delays over distance. But, I hope this was interesting anyway!