// Copy the Entire Framebuffer to itself,
// to fix the missing pixels.
// Not sure why this works.
That has big lack of proper cache flushing energy. ARM-A device support tends to be where you need to get really intentional about managing your memory hierarchy. Smaller cores tend to have simple enough (or no) caches that they don't tend to get in your way much except for knife edge bugs. Bigger systems like x86 just tend to push the cache coherency out even to IO devices. ARM-A class SoCs are that sweet spot of a ton of caches between the CPU and main memory, but simple enough peripherals and fabric that only the CPU cores are coherent.
Yep, and I am not comfortable using software developed by people who don't know about this. Almost the entire ARMv8 TRM constantly mentions cache considerations.
Well, I am. Hacky workarounds like this might be ugly and slow, but modern chips are really fast, and I don't expect an RTOS graphics display to be the fastest thing in the world.
Let me tell you one thing: Even modern chips are not fast enough to pull stupid stunts like that. This is topped off with a horrible routine which manually copies the whole thing bytewise (= 1/4th of a pixel per loop). Using SIMD (I think it is called ASIMD or something) you can transfer like 16 pixels in 32bpp per loop, just to give you an idea (i.e. 64x the amount of data).
Or to say it differently: 1440x720x32bpp is about 4MiB of memory which needs to be read and then written again, per frame. With 60FPS that is 8MiB*60FPS = 480MiB/s of data you shovel around for no reason. Add that loop overhead from this 1/4th of a pixel per loop routine and your awesome fast modern chip is being kept busy with garbage, eating your battery while doing so.
Now you want to avoid updating the whole screen all the time, but there is obviously a more fundamental issue and this should be fixed properly instead of keeping the CPU busy with nonsense. Also keep in mind that this also may lock or delay other data transfers in the system, depending on the system + bus arbitration settings etc.
Exactly. The artifacts shown are stereotypical caching issues: Always same length (cache line size) and earlier written entries are more likely to be fully committed than later written ones but it is (seemingly) random in overall.
Even my x86 ThinkPad has had iGPU driver bugs surrounding LLC flushing with linux kernel updates over the years. They'd manifest with cache-line sized graphics noise/artifacts particularly when showing fullscreen animations.
That one I have under good authority is a hardware bug. PCI devices are supposed to be cache coherent, but Intel got a little too cute when integrating their iGPU into the uncore (and northbridge before it) at a level that looks like PCI to software but is their internal interconnects in actuality.
You manually flush the addresses of the framebuffer via a cache maintenance instruction.
The internet tells me the phone has a A53 which should be ARMv8, so probably a loop of DC CIVAC (Data Cache Clean and Invalidate by Virtual Address to Coherency) instructions over the framebuffer. Though the buffer might be large enough to not even fit in the cache, so it might be more efficient to just flush the entire cache.
Not only is copying it over itself not the only solution (all you need to do is issue the right sequence of cache flushes over the affected area, so first CPU cache flush instructions, and then maybe an additional set of MMIO based cache flushes depending on how your L2 and/or L3 work), but copying over itself isn't even enough to be correct. They lucked out that it seems to work, but it's kind of by chance. That cache is more than in the right to still not flush some lines even with this big copy.
As cool as this is, Genode is much farther ahead[0] and has a much stronger security model (microkernel multiserver with capabilities).
AIUI they're planning to provide user-installable images by next release (2023-02).
It was gonna be 2022-11, but they chose to delay so that end users only see it polished. It is possible to try it by building it yourself, and it's pretty cool.
Practically speaking, "farther ahead" may mean different things to different people. I tried to find a list of boards, systems on a chip (SoCs), or CPUs that Genode supports. There's no master list [1] but you can go digging through the handful of architecture-specific github repos. [2] The number of actually supported SoCs and boards is very limited.
NuttX and Zephyr both support a large number of SoCs and boards [3][4], and they each have a single git repo with a configuration system that lets you build different boards from that single repo. In terms of practical ease of use for hobby and commercial projects, I would say these projects are both far ahead of Genode if your hardware is not supported by Genode and is supported by either NuttX or Zephyr.
In my opinion NuttX is like a secret weapon. RTOS, good hardware support for many boards and CPUs, and a very high performance TCP/IP stack that can be interrupted by hard realtime tasks.
It's also growing in popularity now– I would say it hasn't reached mainstream popularity yet. :)
What is the NuttX community like compared to Zephyr? Seems like they have similar goals and I noticed that the latter has ten times the number of open pull requests.
NuttX was the project of a single guy, over 20 years, which only recently got picked up by bigger players (eg. Xiaomi) when the project was moved to Apache Foundation governance. Before that it was very, very quirky, and while it was possible to do a lot, it was also very effort intensive. I would choose Zephyr just because it has less technical debt.
Maybe choose it if your embedded board, system on a chip, or CPU is supported. NuttX has a lot of supported boards, SoCs, and CPUs, and they differ from Zephyr's supported hardware.