8 versus 16 bytes barely matters for using the addresses, especially because if you're assigning IPs to your devices you can have the second half of the address start with 6-7 zero bytes and collapse them all with ::
And I challenge you to name a way to be "somewhat backward compatible" that would actually function and IPv6 doesn't already do.
The design of IPv6 is for computers, not for humans. How do you even say an IPv6 address aloud? You need to be able to communicate "192 dot 168 dot 50 dot 1" over a voice medium.
That has very little to do with 8 versus 16 bytes.
Edit: And not only can you make your own addresses short, if I look up some IPv6 addresses meant to be said/remembered (public DNS IPs), none of them make you type more than 8 bytes (and that one repeats a cluster to make it easier) and some make you type as little as 4 bytes.
The extra space means you never have to calculate subnet sizes and you can let devices handle their own IPs. I think that's a pretty good tradeoff.
64 bits are already a pain in the ass to remember, and if you have specific memorization needs you can use small static IPs so that even with 128 bits available you only use about 64 of them.
It's a good strategy if deploying a bigger address space is easy and cheap. When it's incredibly difficult and time consuming, you should pause and consider a bit more carefully.
New L3 protocols on the Internet are firmly on the "incredibly difficult and time consuming" side.
Diceware is way easier to share over the phone than any IPv6 address (except for the few vanity ones like Google's 2001:4860:4860::8888 — then it's only slightly easier).
8 versus 16 bytes barely matters for using the addresses, especially because if you're assigning IPs to your devices you can have the second half of the address start with 6-7 zero bytes and collapse them all with ::
And I challenge you to name a way to be "somewhat backward compatible" that would actually function and IPv6 doesn't already do.