Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Google, Apple, and Meta (maybe others?) have the data to build a complete GeoIP dataset. None of them will share because there are only downsides to doing so.

When FB was rolling out ipv6 in 2012, well meaning engineers proposed releasing a v6 only GeoIP db (at the time, the public dbs were shit). Not surprisingly, it was shot down.



We are always happy to work with large technology enterprises and streaming platforms, not necessarily to sell, but to share insights, data, and practical advice. We observe the entire internet through active measurements, and we are open to co-publishing research when it benefits the broader ecosystem.

Google/GCP is top of mind for me due to a recent engineering ticket. Some of our own infrastructure is hosted on GCP, and Google’s device-based IP geolocation model causes issues for internet users, particularly for IPv6 services.

From what we understand, when a large number of users from a censored country use a specific VPN provider, Google's device-based signals can bias the geolocation of entire IP ranges toward that country. This has direct consequences for accessibility to GCP-hosted services. We have seen cases where providers with German-based data centers were suddenly geolocated to a random country with strict internet censorship policies, purely due to device-based inference rather than network reality. Our focus is firmly on the geolocation of exit-node IPs, backed by network evidence.

https://community.ipinfo.io/t/getting-403-forbidden-when-acc...

We are actively looking to connect with someone at Google/GCP, Azure/Microsoft and others who would be willing to speak with us, or directly with our founder.

Our community consistently asks us to partner more deeply with enterprises because we are in constant contact with end users and network operators. To be honest, we do not even get many questions or issues. We are partners with a large CDN company, and I get one message about a month, which usually involves sharing evidence data and not fixing something.

From a large-scale organization's perspective, IP geolocation should not be treated as an internal project. It is a service. Delivering it properly requires the full range of engineering, sales, support, and personnel available around the clock to engage with users, evaluate evidence, and continuously incorporate feedback.


> From what we understand, when a large number of users from a censored country use a specific VPN provider, Google's device-based signals can bias the geolocation of entire IP ranges toward that country.

Yep, this is a known effect.

How it seems to work is: Google uses Android phones as data harvesting probes. And when it sees that a lot of devices in a given IP range pick up on GPS data, Wi-Fi APs or cell tower IDs that are known to be located in Iran, and possibly other cues like ping to client devices or client device languages, timezones, search request contents, then the system infers "there's a network wormhole there with Iran on the other end", and the entire IP range grows legs and drifts towards Iran.

The owner of those IP addresses can mitigate the issue, mostly by shaping traffic or doing things to Google's system, but I know of no way for anyone else to do it.


They have a correction form but I am not sure if it is super robust: https://support.google.com/websearch/workflow/9308722?hl=en

I talked to someone who bought a /24 from South America to be used in the United States for office use. I asked him to tell everyone to get on WiFi and keep Google Maps running. Apparently, that solved the issue.


Do Cloudflare's floating egress IPs probe in a way where you can easily geolocate them?

https://blog.cloudflare.com/cloudflare-servers-dont-own-ips-...


If it is an anycast IP address, we have hints of all locations. However, because we have to produce a standard IP geolocation product, we can only select one address. So, we choose the address we find from a reliable geofeed and designate the IP address as "anycast" in the API response.

Internally, we have an anycast database. I believe we can also provide all the location hints we see for each anycast IP. It is generally niche data though.


At my previous company we had a subscription to Spur Intelligence. It is like Palantir for IP address info, and probably the closest to what you are talking about.

They recently added GeoIP to their data and in the bit of testing I was able to do before I left it was scary good. I also had an amusing chat with one of their engineers at a conference about how you can spoof IPInfo's location probes...


> how you can spoof IPInfo's location probes...

Interesting. I would love to know how this is possible. Like with Geofeed or something else?


If you're doing latency-based probing, location spoofing is presumably possible to an extent by adding artificial delays and possibly spoofing ICMP "TTL expired" packets like https://github.com/blechschmidt/fakeroute


I am not sure whether this kind of IP spoofing will impact our accuracy because we will likely identify the noise and behavioral anomaly and discard the location hint derived from traceroute.

We have tons of historical traceroute data patterns, and generic traceroute behaviors are likely modeled out internally. So, if you can spoof the traceroute to your IP address, our traceroute-based location hint scoring weight for that IP address will decrease, and we will rely on the other location hints.

You have to be extremely deliberate to misguide us. But I would love to see this in action, though.


Yeah, I doubt there are more than a couple of hosts on the entire internet serving fake traceroutes anyway. Even finding hosts that don't enforce BCP38 requires quite some effort these days.


I don't think it is fair to IPInfo to give the specifics publicly, because once you have the "ah ha" moment you realize it is an entire class of difficult to address problems with how they use their sensor network. That knowledge only helps the bad guys.


We are actively trying to improve our system and build it as figuratively 'antifragile'. We can not afford to get comfortable and we need to constantly find faults in it. If you know anything, you can contact our founder or me directly.

The problem is that everyone knows we are the most accurate data provider and our growth is exponential. To my knowledge, most cybersecurity teams use our data to some degree. We cannot risk having any secrets out there that could disrupt the accuracy of the system. We are aware of several cases where accuracy may be affected, with the most notable being adversarial geofeed submissions.

If the issue is an adversarial geofeed submission, it is a well-known problem. When active measurement fails, we have to fallback to some location hint. There are layers of location hints we have to fall through to ultimately landing on echoing geofeed location hint.

But aside from that... I'm not sure what could possibly impact us. A substantial systemic malicious change in data accuracy seems highly unlikely and quite impossible.


Why do we assume that only "bad guys" would want to bypass internet censorship?


Google's GeoIP is creepy good. I noticed a while ago that for fixed or technically dynamic but rarely actually changing IPs, their IP geolocation eventually converges on the exact street address, presumably due to Google crowdsourcing geolocation from devices with GPS or Wi-Fi geolocation access, which is in turn crowdsourced from devices with both GPS and Wi-Fi.


It's pretty slow to converge though, as it needs enough data points so they cross some certainty threshold. Especially in the context of VPN exit points as the traffic comes from all over the world.


Google's GeoIP is rubbish for me. Often it's hundreds of kilometres off, and varies a lot even for a fixed IP.


As always with big corporations, if the experience is OK for 90% of people but absolutely sucks for 10% of people, then that's totally fine!


I can tell you how we approach enterprise partnerships: absolute accountability. If something is wrong with the data, it is not our customers' fault for trusting us, it is our fault. End users talk to us directly. And because the data is so good these days, we just have to present evidence, that's it.

We with multi-billion-dollar corporations, and for every product integration we maintain an active, visible presence in their user communities.

For example: https://community.cloudflare.com/search?q=ipinfo%20order%3Al...

Customer support teams are encouraged to build support pipelines that either route data-related questions directly to us or send users directly. We remove friction rather than hiding behind layers of enterprise support.

We make a deliberate "account manager for everyone" effort when introducing ourselves to a partner's user community. We engage with influential community members and MVP users and encourage them to contact us directly when issues arise. We also connect with the engineers who work hands-on with our data and make it clear that they have a direct line to our engineering team.

We actively and aggressively monitor social media for reports of issues related to our data within partner platforms and engage with users directly when something comes up.

To be honest, this is not difficult. Once or twice a month, we may need to present evidence to a user to explain our data decision.

This is not a paid add-on or a special clause in an enterprise contract. Our customers do not pay extra for this level of engagement.

Developers hold us in high regard. Maintaining that trust requires ongoing investment of time and resources. We fundamentally believe developers trust us because of the quality of the product and the lengths we go to provide clear, honest explanations when questions arise.


90% of end users, not 90% of your customers. If your product blocks 10% of end users because it provides wrong geolocation data to your customer, sucks to be them!


That is a great point! For us, it is 100% of end users not limited to our customers. If you are impacted by our data in any way, it is on us. We are accountable for that.

https://community.ipinfo.io/t/wrong-geolocation-based-on-ip-...

Our free database is licensed under "CC-BA-SA" (freely distributable but requires attribution) because of accountability. If you use our data as an enterprise or a free open-source project, if there is any issue, you can come to us and talk with us.

It is not even end-users. We maintain open communication policies in general. Even if a streaming service does not use our data, if they come to us, we try our best to help them based on our industry knowledge.


How can somebody who is blocked from (looking at your homepage) Docker Hub or Microsoft know that the reason they are blocked is that you have wrong data on them? How would they know to ask you? If they ask Docker Hub or Microsoft, they'll get funnelled into the "well it works for 90% of people" funnel.

Also the reason most IP information companies don't do this is the obvious risk of false information. I am currently in Somalia via a remote connection via Germany. Actually I'm not, but if I emailed you and said I was, how would you know?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: