Looking at the circumvention techniques GoodbyeDPI uses makes me want to cry. Is this really the state of DPI in 2022: changing Host to hoSt, or adding white spaces between method and URI actually works?
I suspect it is also due to the scale at which DPI is used; every additional bit of complexity quickly adds up to increase the amount of processing power required.
It's not as much about talent but about tradeoffs. I work in traffic monitoring tools (not censorship, just observability tools for infrastructure) there's always the decision of how many edge cases you want to cover vs how fast you want your tool to go. At millions of packets per second, an extra "if" might make a big difference in the throughput you're able to monitor. So maybe it's actually reasonable to ignore the .1% that use "hoSt" instead of "Host" to avoid losing .5% of the packets.
I suspect it's not even about not being able to, there's very little motivation. I had a brief contact with ZScaler who operates approximately in this area of traffic inspection, they literally have no clue and they don't care. Their service can be hot flaming trash but people will still pay them money because they check some boxes. I'm sure it applies to other companies in the same area as well.
It is harder problem that it sounds. Deep packet inspection needs to happen at some linespeed. More work you do the harder it is to process it all fast enough. You can write stuff for single packets, but when you have lot of connections happening it becomes much harder problem.
Isn’t this stuff typically specially built HW? I feel like an ASIC can accelerate this stuff fairly quickly although the volumes/pricing may not warrant building that. Also if you’re matching on host name there’s no reason you even need to keep up with line rate. All you need is to do is keep up with the connection establishment rate and you can always do the processing in the background and just issue a TCP reset after the fact.
I actually work in the field (networking) and FPGAs are very common in professional telecommunication equipments, hence my suspicion/guess that DPI are the same, especially since I'm also guessing that this is the sort of thing that may be updated often. So I think my 'suspicion' is at least as good as yours.
I worked in a DPI/firewall company and my work ran on the ASIC accelerator, so nah, my 'guess' is probably better.
FPGA is not worth the trouble. You get neither the (line) speed of ASIC, nor the flexibility of running everything in the CPU. Most serious DPI hardware vendors have stopped using it.
But you are right that it's no fun trying to workaround ASIC bugs.
Well you made a laconic, non-substantive reply so you ought to expect pushback.
FGPAs allow near-ASIC speeds with effectively the flexibility of software in that they can be updated via firmware upgrades, with much cheaper dev. costs than ASICs. They do have a higher unit cost than ASICs but only at high volume. For anything that is 'low' volume an ASIC may not make financial sense at all in any case.
I am no expert in DPI specifically but Google suggests that using FPGAs for DPI is an active commercial topic.
You can get really far with cheap techniques when your goal is to dissuade. The bigger concern I’d have is statistical analysis of top offender.
Every OSI layer offers more bypass techniques and is the halting problem where your goal is to get value without making everything break when a new browser comes out. You can’t cover all options as a 3rd party and get it perfect.
The higher up application layer, the easier it is to bypass. The more you try to classify without impact (dpi,ids,waf,spam,av), the easier bypasses are.
The domains that get effective like spam have quicker feedback loops. Network middle boxes have the slowest response cycle where they are explicitly called out in RFCs
<script> In a url might get blocked but <script >… bc it’s string matching and not layer aware.
most the engines out there weren't made for security but performance. It's disturbing and relaxing at the same time to see how easy it is to bypass them. Something that works 100% is to multiplex a channel, changing it protocols after some packages. You do the SSL handshake, than after some amount of time, you switch it to SSH, I think something like that https://github.com/yrutschle/sslh (couldn't find the real repository that I used, but that one looks similar) could be used after the detection to bypass filters
No, unfortunately almost no of these naive methods longer works. However the protocol spoofing ("fake packet" in GoodbyeDPI) with Auto-TTL is pretty effective on most ISPs of Russia, Korea, Indonesia, Turkey.