Shameless plug: More than a decade ago, I wrote a paper [1] on how the random perturbations in the wireless channel between an ambient RF transmitter (FM radio, TV) to the two devices, allow nearby devices to authenticate locality because the perturbations are correlated only if they are nearby (where nearby is relative to the wavelength being monitored)
There are a couple of difficult cryptographic-type problems with this plan. (Which I like, by the way). I don’t think it’s privacy preserving, and I also don’t think it works well for finding shared locations as currently specified.
Also, the privacy and the finding are in direct opposition to each-other, which isn’t always a comfortable system dynamic.
On the one hand, a simple hash of ESSIDs near you, if you take all of them, is highly unlikely to ever match - radio strength varies and you’ll see stuff on the edges of your device pop in and out if you look at radio traces. So you need to limit the list.
However, if you limit the list, to say 3 ESSIDs, you’re well into rainbow table attack territory - if there are 100mm WiFi access points in the world, then you need 100mm^3 hashes - doable - and if you rough geoloc them first so that you’re not hashing out stuff more than a mile away from other APs, you’re down to “very manageable”.
At the same time, the question of “which 3” means that it’s going to be hard to ever get the same list, or at least you’re not in the one 9s territory of loc matching.
To do this without some sort of either trusted server or some sort of group key sharing (and therefore a totally different threat model) you’ll need to get some sort of location-aware hashing together, and I think also you’ll want to be able to get some sort of data from the local APs that’s not easily accessible elsewhere. Not sure what that is off the top of my head, but I bet there’s something in the WPS spec that you could hang off of.
So if you had the ability to be like “my hash puts me somewhere within this square (area) and only those of us here know that the secret salt for this minute is XXX” then I think you’d get back to the original goals of the project.
I bet that’s doable! Looking forward to v2
Thanks for the feedback! Rainbow tables probably won't work here since the epoch (rounded to 5 minutes) salts everything, so precomputed tables would expire constantly. You would use use BSSIDs + SSID in real implementations. The MinHash part creates 128 hash values from the entire observed set, not a subset of 3, then LSH divides these into bands where similar sets collide probabilistically based on Jaccard similarity. I've tested with two phones where one saw 3-4 networks and the other saw ~10. They still found each other, and you can try it in the interactive demo yourself. You're right it's not entirely privacy preserving, really depends on your threat model as I discuss in the security section. Your idea about combining area hints with time-based salts is interesting though, feels like it could bridge geospatial indexing with this approach! The whole geospatial indexing thing was something I became aware of late in the project and didn't go into further. It's probably a much simpler approach if geolocation is important. In the current approach, geolocation doesn't matter at all, only 'the environment', whatever that may be.
Thanks for the detailed response! I completely missed the band division / similarity plan - I’ll read more thoroughly next time.
I thought about geo largely because it radically changes the order of magnitude of work necessary; it lets you segment ‘possible’ subsets of APs down to sets of say 100, not millions, and changes the combinatorics. A side effect is knowing a rough spatial location.
Off the top of my head, I don’t think that epochs alone make a big difference. If I want to see if you’ve been somewhere, or tell you I’m somewhere, why not take the 3-4 networks you mentioned, and forward hash them for the next million epochs?
Or, more ambitiously, why not take 3-4 networks each from the geo indexed clusters available at https://wigle.net/ and do the forward and backward epochs, letting me track where you’ve been and pretend to be near you any time in the future?
Wigle reports 1.7bn networks; a rough look at a suburban street near me shows most places have 10 in a reasonable range boundary; so call it 200mm “locations” with 128 segmented hashes, 250 billion hashes per epoch — I think we’re in the “seconds per epoch” range for a reasonable compute heavy server to cover the entire space.
Upshot - I think the salting needs to be something local / not predictable or stored remotely.
Hopefully these comments hit you right - I like the idea a lot - and I don’t fully understand the system - but as I understand it, the system does not offer privacy — I could replay any phone’s hashes against a system that cost a few dollars to reconstruct your location and time, if my understanding is correct.
I see, this sheds some new light on your initial concerns. I'm aware an attacker can keep pretending to be inside an environment once they've seen it. I wasn't accounting for a scenario where an attacker has a huge database for queries like coords -> list of wifi networks. I was under the assumption services like Wigle only provided the reverse lookup (wifi -> coords). Indeed an attacker could potentially reverse the LSH tags if it hashed the wifi environment within very small geofences. It's bit of a needle & haystack problem but not an impossible one with enough resources. I wouldn't say it's a perfect system and I don't mind it falling apart under scrutiny, I just found it an interesting idea so I really appreciate you thinking along here.
Edit: Maybe some preshared group hash (kinda beats the point), or combining multiple modalities (eg bluetooth, shared interests) or some kind of proof of work token could help mitigate some of these issues. I guess anything to reduce the time to attack helps in this case? Or anything that really pins down environment + time, like what smath described in his comment. In essence, the core idea of minhash + lsh works and it doesn't limit you to just wifi networks. The key is being able to grab a fingerprint that is unique enough and different enough each epoch. Wifi networks are just easy enough to grab vs something more low level like an APs beacon timing interval jitter or something.
> I see, this sheds some new light on your initial concerns. I'm aware an attacker can keep pretending to be inside an environment once they've seen it. I wasn't accounting for a scenario where an attacker has a huge database for queries like coords -> list of wifi networks
I think this is the issue, is these datasets are out there and at least big tech companies have them since they're used to assist with GPS. I was about to post the same thing as above but saw vessenes beat me to it.
Without thinking about it too hard, the two directions I see are either making observations of the environment in real-time that is only relevant at that time (IE sniffing actual wireless frames, even if they're encrypted and making observations on them, however, most devices won't let you go into promiscuous mode and do this) or encrypting the messages in flight so only participants can decrypt them (IE a model like the signal protocol with E2E message encryption).
Anyways, this is a cool approach, but that risk occurred to me as well about the ability to just brute force the entire dataset to decode every location.
I never got very far with this idea beyond coming up with it, but whe thinking about unfakable calculable characteristics that could be used for locality verificacion, i could only come up with latency as the key characteristic. Based on some geographical threshold distance the speed of light gives a reasonable estimate of how much latency there ought to be when communicating with someone. So if you could issue a challenge and time the response, you should know someone is local to you or not.
I've noticed many metro systems in cities do somehow receive GPS and cellular networks which I'm always impressed by. London on the other hand, has only recently started receiving signal for cellular networks, and rollout is slow.
GPS does not work at all. I've always thought using the WiFi access points that have been installed on the underground could be a great addition to something like Citymapper to figure out where you are located even when there is no GPS.
Both Apple (WPS?) and Android (Location Accuracy) support improving location through WiFi access points and cellular network discovery. That's usually why you are able to get a lock onto your position even while underground.
Wi-Fi and Cellular is why your phone knows where it is, that already works.
I believe Google Maps on iOS can tell you where you are on the Underground. But I presume it’s not using GPS, as you say.
(disclaimer: I am co-inventor at a previous employer, I don't get royalties for it, just reporting)
Thanks for the heads up! Good to know what's out there. Interesting that I independently arrived at something possibly similar.
>Good to know what's out there.
It opens you up to legal risk for knowingly infringing patents. If possible you never should look at a patent.
[dead]
[dead]
TL;DR: it's just using the WiFi networks' names to decide whether you're close or not.
I thought "environmental fingerprints" were referring to something more elaborate, like a fingerprint of the local audio environment, or using the accelerometer to measure the local spacetime curvature.
It's not limited to WiFi networks and indeed the fingerprints you suggest would be a lot better because they would change significantly over time. The whole WiFi networks idea kinda stuck because it's an obvious an easy thing to grab from the environment using a mobile phone. An audio fingerprint would definitely be doable on mobile.
Shameless plug: More than a decade ago, I wrote a paper [1] on how the random perturbations in the wireless channel between an ambient RF transmitter (FM radio, TV) to the two devices, allow nearby devices to authenticate locality because the perturbations are correlated only if they are nearby (where nearby is relative to the wavelength being monitored)
[1] https://dspace.mit.edu/handle/1721.1/121443
There are a couple of difficult cryptographic-type problems with this plan. (Which I like, by the way). I don’t think it’s privacy preserving, and I also don’t think it works well for finding shared locations as currently specified.
Also, the privacy and the finding are in direct opposition to each-other, which isn’t always a comfortable system dynamic.
On the one hand, a simple hash of ESSIDs near you, if you take all of them, is highly unlikely to ever match - radio strength varies and you’ll see stuff on the edges of your device pop in and out if you look at radio traces. So you need to limit the list.
However, if you limit the list, to say 3 ESSIDs, you’re well into rainbow table attack territory - if there are 100mm WiFi access points in the world, then you need 100mm^3 hashes - doable - and if you rough geoloc them first so that you’re not hashing out stuff more than a mile away from other APs, you’re down to “very manageable”.
At the same time, the question of “which 3” means that it’s going to be hard to ever get the same list, or at least you’re not in the one 9s territory of loc matching.
To do this without some sort of either trusted server or some sort of group key sharing (and therefore a totally different threat model) you’ll need to get some sort of location-aware hashing together, and I think also you’ll want to be able to get some sort of data from the local APs that’s not easily accessible elsewhere. Not sure what that is off the top of my head, but I bet there’s something in the WPS spec that you could hang off of.
So if you had the ability to be like “my hash puts me somewhere within this square (area) and only those of us here know that the secret salt for this minute is XXX” then I think you’d get back to the original goals of the project.
I bet that’s doable! Looking forward to v2
Thanks for the feedback! Rainbow tables probably won't work here since the epoch (rounded to 5 minutes) salts everything, so precomputed tables would expire constantly. You would use use BSSIDs + SSID in real implementations. The MinHash part creates 128 hash values from the entire observed set, not a subset of 3, then LSH divides these into bands where similar sets collide probabilistically based on Jaccard similarity. I've tested with two phones where one saw 3-4 networks and the other saw ~10. They still found each other, and you can try it in the interactive demo yourself. You're right it's not entirely privacy preserving, really depends on your threat model as I discuss in the security section. Your idea about combining area hints with time-based salts is interesting though, feels like it could bridge geospatial indexing with this approach! The whole geospatial indexing thing was something I became aware of late in the project and didn't go into further. It's probably a much simpler approach if geolocation is important. In the current approach, geolocation doesn't matter at all, only 'the environment', whatever that may be.
Thanks for the detailed response! I completely missed the band division / similarity plan - I’ll read more thoroughly next time.
I thought about geo largely because it radically changes the order of magnitude of work necessary; it lets you segment ‘possible’ subsets of APs down to sets of say 100, not millions, and changes the combinatorics. A side effect is knowing a rough spatial location.
Off the top of my head, I don’t think that epochs alone make a big difference. If I want to see if you’ve been somewhere, or tell you I’m somewhere, why not take the 3-4 networks you mentioned, and forward hash them for the next million epochs?
Or, more ambitiously, why not take 3-4 networks each from the geo indexed clusters available at https://wigle.net/ and do the forward and backward epochs, letting me track where you’ve been and pretend to be near you any time in the future?
Wigle reports 1.7bn networks; a rough look at a suburban street near me shows most places have 10 in a reasonable range boundary; so call it 200mm “locations” with 128 segmented hashes, 250 billion hashes per epoch — I think we’re in the “seconds per epoch” range for a reasonable compute heavy server to cover the entire space.
Upshot - I think the salting needs to be something local / not predictable or stored remotely.
Hopefully these comments hit you right - I like the idea a lot - and I don’t fully understand the system - but as I understand it, the system does not offer privacy — I could replay any phone’s hashes against a system that cost a few dollars to reconstruct your location and time, if my understanding is correct.
I see, this sheds some new light on your initial concerns. I'm aware an attacker can keep pretending to be inside an environment once they've seen it. I wasn't accounting for a scenario where an attacker has a huge database for queries like coords -> list of wifi networks. I was under the assumption services like Wigle only provided the reverse lookup (wifi -> coords). Indeed an attacker could potentially reverse the LSH tags if it hashed the wifi environment within very small geofences. It's bit of a needle & haystack problem but not an impossible one with enough resources. I wouldn't say it's a perfect system and I don't mind it falling apart under scrutiny, I just found it an interesting idea so I really appreciate you thinking along here.
Edit: Maybe some preshared group hash (kinda beats the point), or combining multiple modalities (eg bluetooth, shared interests) or some kind of proof of work token could help mitigate some of these issues. I guess anything to reduce the time to attack helps in this case? Or anything that really pins down environment + time, like what smath described in his comment. In essence, the core idea of minhash + lsh works and it doesn't limit you to just wifi networks. The key is being able to grab a fingerprint that is unique enough and different enough each epoch. Wifi networks are just easy enough to grab vs something more low level like an APs beacon timing interval jitter or something.
> I see, this sheds some new light on your initial concerns. I'm aware an attacker can keep pretending to be inside an environment once they've seen it. I wasn't accounting for a scenario where an attacker has a huge database for queries like coords -> list of wifi networks
I think this is the issue, is these datasets are out there and at least big tech companies have them since they're used to assist with GPS. I was about to post the same thing as above but saw vessenes beat me to it.
Without thinking about it too hard, the two directions I see are either making observations of the environment in real-time that is only relevant at that time (IE sniffing actual wireless frames, even if they're encrypted and making observations on them, however, most devices won't let you go into promiscuous mode and do this) or encrypting the messages in flight so only participants can decrypt them (IE a model like the signal protocol with E2E message encryption).
Anyways, this is a cool approach, but that risk occurred to me as well about the ability to just brute force the entire dataset to decode every location.
I never got very far with this idea beyond coming up with it, but whe thinking about unfakable calculable characteristics that could be used for locality verificacion, i could only come up with latency as the key characteristic. Based on some geographical threshold distance the speed of light gives a reasonable estimate of how much latency there ought to be when communicating with someone. So if you could issue a challenge and time the response, you should know someone is local to you or not.
I've noticed many metro systems in cities do somehow receive GPS and cellular networks which I'm always impressed by. London on the other hand, has only recently started receiving signal for cellular networks, and rollout is slow.
GPS does not work at all. I've always thought using the WiFi access points that have been installed on the underground could be a great addition to something like Citymapper to figure out where you are located even when there is no GPS.
Both Apple (WPS?) and Android (Location Accuracy) support improving location through WiFi access points and cellular network discovery. That's usually why you are able to get a lock onto your position even while underground.
Wi-Fi and Cellular is why your phone knows where it is, that already works.
I believe Google Maps on iOS can tell you where you are on the Underground. But I presume it’s not using GPS, as you say.
Phones already use WIFI for location.
Watch out, possibly similar to this patent: https://patentimages.storage.googleapis.com/e4/9b/4e/883a9df...
(disclaimer: I am co-inventor at a previous employer, I don't get royalties for it, just reporting)
Thanks for the heads up! Good to know what's out there. Interesting that I independently arrived at something possibly similar.
>Good to know what's out there.
It opens you up to legal risk for knowingly infringing patents. If possible you never should look at a patent.
[dead]
[dead]
TL;DR: it's just using the WiFi networks' names to decide whether you're close or not.
I thought "environmental fingerprints" were referring to something more elaborate, like a fingerprint of the local audio environment, or using the accelerometer to measure the local spacetime curvature.
It's not limited to WiFi networks and indeed the fingerprints you suggest would be a lot better because they would change significantly over time. The whole WiFi networks idea kinda stuck because it's an obvious an easy thing to grab from the environment using a mobile phone. An audio fingerprint would definitely be doable on mobile.
[dead]