I'm now expecting we'll see a couple things in the next few years:
1. An explosion of residential proxy networks and other stuff to circumvent blocking of cloud IP ranges, for all the various AI scraping tools to use.
2. A corresponding explosion of countermeasures to the above. Instead of blocking suspicious IPs, maybe they get a 3GB file on their request to /scrape-target.html
Perhaps an explosion of usage. There's already a few very large residential proxy networks.
And, I get frequently contacted by Bright data to install their code in my repo.
I think that may be against the ToS of most residential ISPs.
Perhaps, but it's already fairly prodigious. Among "ethical" providers, it's often bundled as a background service in a lot of clickwrap "freeware". (To say nothing of compromised computers in a botnet)
Hello,
A (different) proxy company owner here. This sucks! Sorry that you lost out on so much bandwidth.
Feel free to reach out to me at tim@pingproxies.com and I'd be happy to get you set up on our service and credit you with 100GB of free bandwidth to help soften the blow. I'll also be able to get you pricing alittle better than you're currently on if you are interested ;)
Within the next few months we're also releasing a bunch of tools to help stop things like this happening on our residential network such as some intelligent routing logic, spend controls and a few other things.
You may also want to look into Static Residential ISP Proxies - we charge these per IP address rather than bandwidth and they often end up more economical. We work with carriers like Spectrum, Comcast & AT&T directly to get IP addresses on their networks so they look like residential connections but host them in datacenters - this way you get 99.99%+ availability, 1G+ throughput, stable IP addresses and have unlimited bandwidth.
@ everyone else in the thread; if you run a start-up and need proxies then email me - happy to credit you with 50GB free residential bandwidth + give some advice on infra if needed.
Cheers,
Tim at Ping
I’m interested to know how your residential connections are sourced.
It says they’re “ethically sourced”, but it seems like malware/botnet like behavior.
Are these residential users aware their traffic is siphoned off for this purpose?
Literally everyone says they use ethical sourcing, but I never believe that about any residential proxy service without solid proof.
They are never ethically sourced. Ethically for them means placing a phrase in a 10k word TOS when victims installs app X, game y which loads their sdk. Ethically here means "we warned them in a TOS"
Huh?
> We work with carriers like Spectrum, Comcast & AT&T directly to get IP addresses on their networks so they look like residential connections but host them in datacenters - this way you get 99.99%+ availability, 1G+ throughput, stable IP addresses and have unlimited bandwidth.
mhm, meanwhile his website says he has "Access our 115+ Million proxy network."
huh?
I feel like everyone takes statements at 100% truth without ever considering the context or the source.
A single reply with nothing but a quote from the person under investigation is apparently enough to squash all wrongdoing.
I know it's not what they mean, but 115+ million IPv6 addresses easily fit in a /64.
Our main business is Static ISP Proxies; here we liaise directly with datacenters and carriers such as ATT, Comcast and others to bring subnets to their network and we'll then purchase IP transit from them.
We do also have residential peer proxies available - you're right to have ethical concerns as there are bad actors out their that effectively build botnets and spread malware to get their nodes but the industry has developed a lot over the last few years and there are numerous companies, including ourselves, which have pretty strict ethical guidelines. Their are three main ways to ethically source real residential nodes:
1. Direct payment to peers for traffic sent through their devices. There are several networks like EarnApp, Honey, Pawns and others where people can sign up and earn money for bandwidth sent through their devices. We liaise with these networks to add nodes to our pool.
2. Quid pro quo with peer through providing free apps in return for the ability to route traffic through their devices. We don't currently engage in this method but we are planning on doing so within the next 12 months through a free VPN - the important thing here is that peers have to understand what they're signing up for in return for the free service - as long as you're upfront, then it is my belief that their is informed consent and it is therefore ethical; there is often a good value proposition to the customer in these cases i.e spend $7 a month on a paid VPN service or get a free one in return for exchanging a small amount of bandwidth which has zero marginal cost.
3. Offer SDK to developers to monetize applications - this is pretty common and while it is similar to 2. - the ability to distribute the SDK to various developers makes it easier to get a large number of peers online. Again though, its important app developers provide notice of this to their users and most reputable SDK providers have strict guidelines and mandatory screens that must be shown to end users prior to registering them as a residential proxy node.
There is also a lot of other things that are involved with making an ethical network - a big thing is to just signal that bad actors and criminals aren't welcome on your network. This is usually done by banning certain domains; for example, we ban all .edu and .gov domains as well as most banking/finance websites + are a member of the Internet Watch Foundation and block their listed domains. This has stops bad actors from using our proxy network for evil + protects peers in the network from bad activity going through their devices.
Happy to answer any other questions if you have them :)
Apparently you consider both 2 and 3 ethical, and your ethical company is at least expanding to 2. In that case, your ethical standard is just very different from many (most?) of us; we classify 2 and 3 as “shady as fuck”, and 1 as questionable.
Shady okay, but I think your line about ethics being "very different" is going too far. "Here is a free VPN that will use some of your bandwidth for other people's connections." is a pretty fair trade. And you don't seem to be accusing them of hiding things or tricking those users, but saying a deal like that is inherently objectionable.
Most of these free VPNs rely on people not reading the agreement, and even when it’s made fairly clear, rely on people not understanding the true meaning of sharing their connections. Don’t get me started on the SDKs. I’m not accusing them of tricking users because I didn’t bother to expand on the topic.
1 is clearly ethical, someone has to install an app specifically for this in exchange for money. Your ISP might not like it but since when does anyone care about an ISP ToS? You're not allowed to pirate movies either according to your ISP.
Some ISP terms are more reasonable than others. Not running a commercial data operation on a residential contract is one of the more reasonable ones; they clearly have commercial contracts available, you know why people who run lease-your-upload-bandwidth-for-money software aren’t choosing them. Both the people running the apps and the ones encouraging them to do so are questionable here, and the former may not be fully aware of the consequences (I personally know someone who got throttled then eventually banned by their ISP this way) whereas the latter would know.
Btw, no I don’t pirate movies (I do sometimes torrent content I already bought because I don’t like the official player). Again, your ethical standard is different from mine, and mine different from people who don’t torrent at all, for instance.
> there is often a good value proposition to the customer in these cases i.e spend $7 a month on a paid VPN service or get a free one in return for exchanging a small amount of bandwidth which has zero marginal cost.
Until someone sends bomb threats or downloads child porn via your IP....
[deleted]
Are you concerned with this activity being prohibited by the AUP of your users' ISP? Do you allow eyeball ASes to opt out of having their network resold in this way?
Not at all. Firstly, just from a legal standpoint, the AUPs aren't signed by us; they're signed by the customer and as long as they understand what they're doing through us ensuring we get informed consent, then its their responsibility and judgement on whether they want to break the rules.
On to the ethics of it, again I find it pretty hard to side with ISPs here since the only reason they don't want this activity on their network is because they don't want the additional bandwidth flowing through their fiber and personally, I believe if you buy a 100mb or a 1G internet line from a carrier then it should be yours to use as you wish as long as it remains within the law. This is compounded by the fact that carriers themselves seem to have a tendency to disregard user / privacy agreements and have been happy to sell metadata and location information to any data brokers without ever checking with their customers whether its okay or not.
This is obviously the opinion of someone who has a stake in the game but when it comes to web-scraping, VPN usage, proxies and internet usage in general I tend to find myself believing in a free and open web with any blocks, restrictions or censorship usually being a bad thing.
> from a legal standpoint, the AUPs aren't signed by us; they're signed by the customer and as long as they understand what they're doing through us ensuring we get informed consent, then its their responsibility
Have you consulted legal counsel about this? What you're describing sounds like tortious interference.
> only reason they don't want this activity on their network is because they don't want the additional bandwidth flowing through their fiber
As someone who has a stake in a small ISP: this is not true. I don't want you trashing the reputation of my IPs and getting them banned from the services your customers are scraping. Replacing those IPs comes at a significant cost ($8000-9000 per /24).
You've definitely got an interesting view point and I appreciate your take as a stakeholder.
To address the first point, I had to look up tortious interference haha but after seeing the main elements, I don't think we'd be close to meeting that threshold. Mainly because:
1. Offering a service which is then engaged with by a end user != Convincing/Interrupting/Interfering
2. We don't ever know the internet contracts they've signed + I actually don't know whether AUP prohibit this kind of activity (I don't think its common, at least in the UK, from my knowledge)
3. I think most ISPs would be hard stretched to provide any material damage
To your second point, since you're a smaller ISP I can understand your position somewhat and my original post was more to do with the big players; we have a lot of experience with large ISPs and they tend to be happy to lease IP space / IP transit if the price is right.
As someone that knows the space very well, I also think the risks here are pretty overstated and subnet bans are incredibly rare and usually caused by activity en masse across an entire block. The likelihood that every single one / or most of your customers would be web-scraping with their IP address is pretty much zero. I guess the effect of activity also depends a lot on what activity it is and how the network is ran - we're very strict on the traffic that can go through the network and everything high-risk is blocked i.e government, edu, banking and extra-extra-bad stuff to avoid issues; I concede that if the company running the network is allowing mailing and everything else under the sun it would have a larger effect on stakeholders such as yourself but I think to broadly say web-scraping on a small portion of IPs on an AS ends in the ISP having to purchase new prefixes is a stretch and hypothetical. If you do actually have experience and have had to purchase new subnets in the past because of this stuff then I'd definitely be interested to hear more (tim@pingproxies.com) and I'd be happy to remove any IPs from your AS if they're present in our network.
Cheers,
Tim at Ping
Hey, Any experience with running bots for games upon your network, Most of them will block signups/auto ban datacenter ip's at this point, Curious if you might be a valid alternative.
Best to hop on with support@pingproxies.com and explain your use-case. They'll be able to say whether or not we have a service that fits your needs.
Cheers,
Tim at Ping
The fact that you're willing to entertain this request is rather telling.
Video game bots aren't illegal and they're barely immoral. I purchased the game, let me play it how I want to, one could argue. Sure it's probably against the ToS, but who reads that anyway right?
Sigh. This argument again?
Modern video games - certainly the ones that people care about running bots for - are multiplayer games. Players using bots actively degrade the game experience for other players.
(Besides, people inquiring about running bots on residential proxies usually aren't in it for their own enjoyment. That sort of commitment typically means they're doing it for profit, e.g. to sell in-game items and/or "boosted" accounts.)
200GB is nothing since 2018 when AT&T mass introduced their 1-gig symmetric fiber. Any single common gigabit link can run 200GB in 15 minutes.
On any gig link, over the course of 6 hours you can transmit a little more than 4TB one way.. which is 40x more.
Too bad AWS didn’t get that memo, 200GB would cost $18 there, and somehow the company in the original post is paying $500 for that bandwidth with whoever their proxy host is.
Haha unfortunately we use residential proxies under the hood to simulate real users (as you'd expect from AI agents), where bandwidth is significantly more expensive!
How does a residential proxy work? Do people rent out their internet connections to commercial services?
Computers getting infected with malware, pre-compromised cheap internet devices from Amazon/Wish.com, and game developers monetizing "free" games by running proxies in the background.
There are usually a few layers of resellers so technically the proxy provider can throw their hands up in the air and say they are unaware of any malicious activity.
The screenshot is from webshare
" 200GB of proxy bandwidth was approximately $500 burned over the course of 6 hours"
The fuck ? So Internet is literally more expensive than buying a drive at amazon, paying for shipping, filling it up putting it on a truck towards a destination anywhere in the world.
Well, one part of the source of the problem is this, where I not even understand all of the words (a bit exaggarating):
> Skyvern is an AI agent that helps companies automate workflows in the browser. We run leverage proxy networks and run headful browser instances in the cloud to facilitate most of our automations.
So you're doing Selenium, just with Cloud, AI and some other buzzwords you found while Googling?
I mean, yeah, bandwidth costs aren't just about bytes, they're about energy, infrastructure, and routing complexity too.
200GB for $500? What cloud is this?
I don't think it's a cloud. It's more likely a residential proxy network, which are typically created by installing malware on users' machines.
The operators of these proxy networks want to avoid detection by both the users whose bandwidth they're stealing, and by the companies whose data is being scraped. So they want to make the bandwidth very expensive. And that expensive bandwidth in turn means that their only clients are dodgy as well. Either people looking to scrape data without consent and monetize it, or outright criminals.
I use one. I run a bot on IRC that extracts the <title> of every link posted (or downloads the image/whatever and extracts Metadata) and announces that to the channel. It has become more and more pointless to run this on a vps. Google/YouTube block the IP range, a lot of websites return the cloudflare security check, Amazon works on some days and doesn't on others... Ever since I proxy via residential proxies it just works. I'm a smooth criminal. :>
So much for the open internet.
You can thank the spammers.
I’m not sure how much of this is due to spammers and how much is due to “growth & engagement” that wants to make sure a human’s time is being wasted.
To stop spammers, you implement measures before posting, not before viewing. Spam is just a minor technical nuisance. It's automated interaction that really makes their executives sweat and shiver.
I feel your pain, but I refuse to cave. Say, 10% of the links fail to load, so what? It is their loss, not mine.
There's many reputable residential proxy networks too, usually there's a lot of vetting involved too as they don't want people running illegal activities though their network.
It's almost a necessity these days to have access to that due to how much datacenter ranges are blocked.
It's kind of surprising that a presumptively legitimate company (and YC-funded startup) would out themselves as buying black market residential proxy bandwidth, isn't it?
Their frontpage also advertises the ability to pass CAPTCHAs, whether by automation or more likely by delegating them to third-world CAPTCHA farms. If that's a major selling point for your automation service then your target market probably ranges from dubious (e.g. data scrapers trying to get around limits) to extremely dubious (e.g. ticket scalpers, spammers, click fraud, etc).
Just because something can be used for sketchy purposes doesn't mean that's the only purpose of it. there are thousands of situations where people are forced to interact with a shitty website 100x per day and the site won't provide an api. Imagine if your job was booking plane tickets all day. United could provide you an API key to do so via an API, but in practice they won't, only some enterprisey travel software company can get that kind of access, for a steep fee. You could build a tool which automatically puts together an itinerary based on rules and books it, through a tool like this. Perhaps a slightly contrived example but I believe things like this definitely happen.
> United could provide you an API key to do so via an API, but in practice they won't, only some enterprisey travel software company can get that kind of access, for a steep fee. You could build a tool which automatically puts together an itinerary based on rules and books it, through a tool like this. Perhaps a slightly contrived example but I believe things like this definitely happen.
And you think that's NOT sketchy?
I'm almost afraid to ask where you think the bar is...
And why is it? A company provides you an API for a "fee" and a free web-based interface, as long as you are agile enough to use it, with some limitations per ip/cookie. You choose the second path and automate it. What's wrong with that? Limits of the free access are the public contract. You're not obliged to play along with someone's "monetary spirit".
And in practice, APIs are often much more PITA than the actual interface, but you can't buy unlimited web automation. Few years ago one of my projects literally OCRed data from an android phone screen because receiving it via API took a couple minutes longer and involved email-like back and forth with polling and id matching after a convoluted authentication that fails every few requests.
It's exactly as sketchy as having a hypothetical robot sit down at a console and type it out. Which, IMO, is not very sketchy at all.
A very common and pro-consumer use for residential proxies is price scraping and price comparisons.
Most businesses don't want to compete on price and are extremely unhappy if you tell people that their competition sells the same stuff but for less, that their "best deal of the month" is actually a price raise, or that they significantly raise toilet paper prices every time there's a natural disaster.
Agreed. Just for reference, one of our most popular use-cases is automating data entry into CRMs without APIs... No one wants to be doing this stuff manually, and automating it has some serious positive QoL impact
We get a lot of requests for bad usage (ie spinning up upvote rings on Reddit) but we don't want to support things like that
> one of our most popular use-cases is automating data entry into CRMs without APIs... No one wants to be doing this stuff manually, and automating it has some serious positive QoL impact
No-one would need captcha automation or residential proxies for a use case like that that's all on the level.
But no one can or needs to use a residential proxy for automating CRM data entry.
Imagine a legitimate travel agency cannot book 100 United tickets a day via methods outlined in business contracts and need to resort to shady practice.
Dude, please provide some real solid evidence to back this up, and perhaps come up with another realistic scenario where bypassing captcha is justified.
> Imagine a legitimate travel agency cannot book 100 United tickets a day
That's the whole point, I never said travel agency, I was thinking a company with travelling consultants.
How TF is it "shady" to purchase and use airfare?
And again, bypassing captcha, say, to purchase tickets isn't evil either, if you are purchasing them for use and not for resale. It would just allow a person to book tickets for 50 people without wasting 6 hours to complete 25 CAPTCHAS and type in my information 25 times.
CAPTCHA is a blunt instrument deployed in an attempt to mitigate abuse, but it has a massive bad side effect that for every heavy user (not just evil users), it requires a human butt to be in a seat somewhere to do mindless busywork that could otherwise be automated. Working around that (sounds like OP agrees to do so on a case by case basis) is not inherently evil. It's as evil (or benign) as whatever you're using it for.
You ever see that video of the women paying a thousand dollars to skip to the front of the release day line to buy one of the first generation iPhones?
Then when she did and the employees told her they limited customers to buying one or two iPhones per person she becomes incredibly flustered. The guy who sold his spot in the line celebrates with a free phone.
What you’re describing is analogous and there’s a reason that went viral on the internet and was reported on in the mainstream, but I won’t spell it out for you.
How long have you been here? It's not surprising at all. HN and YC have not demonstrated an aversion to "uh, greyhat" activity.
If it were 2000, people would be sharing their ad clicking startups.
YC has funded a looooooot of sketchy companies.
Residential proxies are not necessarily "black market".
It's almost never done with the full understanding of the person providing the proxy, doesn't matter if they get promised some change, their browser addons betray them or they install bundleware/adware.
I'd say it has about the same moral standing as a payday loan.
There’s other ways for example through mislabeled “residential” blocks, or “residential” proxies that are sold by ISPs to vendors.
Usually such proxy networks are outright criminal (even if users are not).
It’s not necessarily malware. There are services that are pretty upfront and pay cash money for residential US bandwidth. That said, naive people might be surprised when their IP starts getting blocked.
>That said, naive people might be surprised when their IP starts getting blocked.
Or law enforcement shows up at their door because their IP is involved in a bunch of illegal stuff.
how does expensive bandwidth equate to dodgy clients? There are lot's of valid use cases for scraping data, and it's legal to scrape publicly available data, even if the websites hosting it try to block it (try a curl request to reddit, for example)
>>>and it's legal to scrape publicly available data, even if the websites hosting it try to block it
Is that something that's been fully decided? https://en.wikipedia.org/wiki/Craigslist_Inc._v._3Taps_Inc. is the most relevant case I'm aware of, and it suggests it might actually be illegal (if you know you've been blocked, at least).
I wonder if there is a more recent bug related to this?
I think that Chrome is still doing a lot of downloads. They're just no longer showing them to the user.
And uploads.
I would have liked to see a bit more of 5 Whys here. It seems like a consistent lesson that startups have to learn over and over is how to manage external dependencies, and particularly the dangers of having Google as a dependency. This is new Chrom(e|ium) behavior, and it has a real cost, both for this company and for users, which may or may not be worth the ROI, but this is what happens when you have a large scale external dependency: stuff moves without your knowledge, consent, or control.
Instead of Always. Be. Closing. it should be Always. Be. Mitigating. Dependencies. for startups.
This is a great callout.
We had an internal discussion about how to manage dependencies effectively, and we made the decision accept the risk that comes with blindly relying on Chrome for now, instead of investing heavily in mitigating that risk today.
The main motivator was for us to continue moving fast, and accept that we have a few hard dependencies in our business.
The goal is to find product market fit, then allocate time to de-risk some of these hard dependencies. If we fail to find product market fit, this may not matter at all
I think that's a fair strategy. Strong PMF generally overcomes weak execution, the challenge is that when you have hard dependencies on entities like Google or Apple it can easily become existential. Even if you choose to move forward with this dependency you should establish guard rails within your system to ensure you catch shifts faster that may be impactful and have a plan for mitigation. For instance, you should identify key points of integration and possible alternatives even if you choose not to migrate now, so that a future migration is better understood and can be discussed intelligently in the heat of the moment. Even internal documentation can assist as a mitigation for dependency risk.
Yeah exactly. One action item from this is that we need to add anomaly detection to our proxy usage metrics so we can catch this in 15 minutes instead of 6 hours :)
What infrastructure is this using? Bandwidth seems pretty pricy
No kidding. AWS's notoriously expensive data transfer is only $0.09/GB. Who's charging $2.50/GB? Are they running on a cellular SIM with no data plan?
Residential rotating proxy providers charge very high rates for data, on the order of $1 - $10 per GB. (These providers often do run their proxies through the cellular network, actually.)
Is this something where end users can get paid for doing nothing other than proxying some traffic through their ISP?
The end user typically has their device compromised by using free apps where the developers were bribed $$$ to add the proxy "SDK". The botnet operator then rents out the bandwidth at exorbitant rates to anyone who will pay for it.
Chrome extensions are also a huge source of this, they look for extensions with a large install base and then make an offer to buy it to turn all the users into proxies.
end users install shady VPN apps/extensions to watch pirated content, and become part of residential proxy mesh/botnet
That's probably where some of the proxies come from.
Yes. Google “honeygain”
Sure, if you want a whole bunch of legitimately malicious traffic to be attributed to your internet account.
If by “some” traffic you mean botnets, sneaker and ticket scalpers, scammers, content scrapers, credential stuffers … generally scummy stuff, sure.
Based on this blog post I would not do any business with Skyvern, if they indeed do business with this underworld of bottom feeders.
Sounds like they are running a web scarping business -- so maybe? Using a cellular connection would be one way to help not get immediately capcha-ed by every site using cloudflare.
They should really setup their scraper and (exfil the data) via regular connections.
You're clearly associated with scrapingfish and not just a customer, your entire comment history is just shilling for them.
[deleted]
"We run leverage proxy networks and run headful browser instances"
Um...say what? I'm pretty broadly based in IT, and I have no idea what that means.
Haha, apologies for the language!
We use residential proxy networks when running Skyvern to help simulate real human behaviour (because that's what Skyvern is trying to do).
We run headful browser instances (meaning a real chrome instance running with a real viewport) for the same cause!
Honestly given many of these stories, $500 seems to be getting off pretty lightly.
It’s still absurd to me that many (most?) of these hosting/bandwidth providers don’t seems to allow automatic cut offs and such
It definitely could have been much worse. We burned through our monthly allocation in 6 hours HAHA, I'm grateful that our allocation wasn't something like 10TB
yeah, that could have been "exciting" :-O
Blocking Google from downloading anything onto your computer without consent is always a good idea.
We were pretty careful about what we were blocking here -- had the exact same concern. Hopefully it doesn't come back to bite us in the future (new blogpost incoming?)
Especially if you're using expensive bandwidth from botnets.
[deleted]
[dead]
[deleted]
>200GB of proxy bandwidth
Gigabyte is a measure of information.
Bandwidth is information transmitted over time.
you shouldn’t be paying by the terabyte. Colocate and just pay for the maximum throughout. Far better rates
doesn't work when the sites you're scraping block the IPs/range of your server. They're using a proxy botnet that costs a premium
makes sense but you shouldn't be paying a premium for a non-premium service. IP blocks and bandwidth have low unit cost at scale.
I'm now expecting we'll see a couple things in the next few years:
1. An explosion of residential proxy networks and other stuff to circumvent blocking of cloud IP ranges, for all the various AI scraping tools to use.
2. A corresponding explosion of countermeasures to the above. Instead of blocking suspicious IPs, maybe they get a 3GB file on their request to /scrape-target.html
Perhaps an explosion of usage. There's already a few very large residential proxy networks.
And, I get frequently contacted by Bright data to install their code in my repo.
I think that may be against the ToS of most residential ISPs.
Perhaps, but it's already fairly prodigious. Among "ethical" providers, it's often bundled as a background service in a lot of clickwrap "freeware". (To say nothing of compromised computers in a botnet)
Hello,
A (different) proxy company owner here. This sucks! Sorry that you lost out on so much bandwidth.
Feel free to reach out to me at tim@pingproxies.com and I'd be happy to get you set up on our service and credit you with 100GB of free bandwidth to help soften the blow. I'll also be able to get you pricing alittle better than you're currently on if you are interested ;)
Within the next few months we're also releasing a bunch of tools to help stop things like this happening on our residential network such as some intelligent routing logic, spend controls and a few other things.
You may also want to look into Static Residential ISP Proxies - we charge these per IP address rather than bandwidth and they often end up more economical. We work with carriers like Spectrum, Comcast & AT&T directly to get IP addresses on their networks so they look like residential connections but host them in datacenters - this way you get 99.99%+ availability, 1G+ throughput, stable IP addresses and have unlimited bandwidth.
@ everyone else in the thread; if you run a start-up and need proxies then email me - happy to credit you with 50GB free residential bandwidth + give some advice on infra if needed.
Cheers, Tim at Ping
I’m interested to know how your residential connections are sourced.
It says they’re “ethically sourced”, but it seems like malware/botnet like behavior.
Are these residential users aware their traffic is siphoned off for this purpose?
Literally everyone says they use ethical sourcing, but I never believe that about any residential proxy service without solid proof.
They are never ethically sourced. Ethically for them means placing a phrase in a 10k word TOS when victims installs app X, game y which loads their sdk. Ethically here means "we warned them in a TOS"
Huh?
> We work with carriers like Spectrum, Comcast & AT&T directly to get IP addresses on their networks so they look like residential connections but host them in datacenters - this way you get 99.99%+ availability, 1G+ throughput, stable IP addresses and have unlimited bandwidth.
mhm, meanwhile his website says he has "Access our 115+ Million proxy network." huh?
I feel like everyone takes statements at 100% truth without ever considering the context or the source.
A single reply with nothing but a quote from the person under investigation is apparently enough to squash all wrongdoing.
I know it's not what they mean, but 115+ million IPv6 addresses easily fit in a /64.
Our main business is Static ISP Proxies; here we liaise directly with datacenters and carriers such as ATT, Comcast and others to bring subnets to their network and we'll then purchase IP transit from them.
We do also have residential peer proxies available - you're right to have ethical concerns as there are bad actors out their that effectively build botnets and spread malware to get their nodes but the industry has developed a lot over the last few years and there are numerous companies, including ourselves, which have pretty strict ethical guidelines. Their are three main ways to ethically source real residential nodes:
1. Direct payment to peers for traffic sent through their devices. There are several networks like EarnApp, Honey, Pawns and others where people can sign up and earn money for bandwidth sent through their devices. We liaise with these networks to add nodes to our pool.
2. Quid pro quo with peer through providing free apps in return for the ability to route traffic through their devices. We don't currently engage in this method but we are planning on doing so within the next 12 months through a free VPN - the important thing here is that peers have to understand what they're signing up for in return for the free service - as long as you're upfront, then it is my belief that their is informed consent and it is therefore ethical; there is often a good value proposition to the customer in these cases i.e spend $7 a month on a paid VPN service or get a free one in return for exchanging a small amount of bandwidth which has zero marginal cost.
3. Offer SDK to developers to monetize applications - this is pretty common and while it is similar to 2. - the ability to distribute the SDK to various developers makes it easier to get a large number of peers online. Again though, its important app developers provide notice of this to their users and most reputable SDK providers have strict guidelines and mandatory screens that must be shown to end users prior to registering them as a residential proxy node.
There is also a lot of other things that are involved with making an ethical network - a big thing is to just signal that bad actors and criminals aren't welcome on your network. This is usually done by banning certain domains; for example, we ban all .edu and .gov domains as well as most banking/finance websites + are a member of the Internet Watch Foundation and block their listed domains. This has stops bad actors from using our proxy network for evil + protects peers in the network from bad activity going through their devices.
Happy to answer any other questions if you have them :)
Apparently you consider both 2 and 3 ethical, and your ethical company is at least expanding to 2. In that case, your ethical standard is just very different from many (most?) of us; we classify 2 and 3 as “shady as fuck”, and 1 as questionable.
Shady okay, but I think your line about ethics being "very different" is going too far. "Here is a free VPN that will use some of your bandwidth for other people's connections." is a pretty fair trade. And you don't seem to be accusing them of hiding things or tricking those users, but saying a deal like that is inherently objectionable.
Most of these free VPNs rely on people not reading the agreement, and even when it’s made fairly clear, rely on people not understanding the true meaning of sharing their connections. Don’t get me started on the SDKs. I’m not accusing them of tricking users because I didn’t bother to expand on the topic.
1 is clearly ethical, someone has to install an app specifically for this in exchange for money. Your ISP might not like it but since when does anyone care about an ISP ToS? You're not allowed to pirate movies either according to your ISP.
Some ISP terms are more reasonable than others. Not running a commercial data operation on a residential contract is one of the more reasonable ones; they clearly have commercial contracts available, you know why people who run lease-your-upload-bandwidth-for-money software aren’t choosing them. Both the people running the apps and the ones encouraging them to do so are questionable here, and the former may not be fully aware of the consequences (I personally know someone who got throttled then eventually banned by their ISP this way) whereas the latter would know.
Btw, no I don’t pirate movies (I do sometimes torrent content I already bought because I don’t like the official player). Again, your ethical standard is different from mine, and mine different from people who don’t torrent at all, for instance.
> there is often a good value proposition to the customer in these cases i.e spend $7 a month on a paid VPN service or get a free one in return for exchanging a small amount of bandwidth which has zero marginal cost.
Until someone sends bomb threats or downloads child porn via your IP....
Are you concerned with this activity being prohibited by the AUP of your users' ISP? Do you allow eyeball ASes to opt out of having their network resold in this way?
Not at all. Firstly, just from a legal standpoint, the AUPs aren't signed by us; they're signed by the customer and as long as they understand what they're doing through us ensuring we get informed consent, then its their responsibility and judgement on whether they want to break the rules.
On to the ethics of it, again I find it pretty hard to side with ISPs here since the only reason they don't want this activity on their network is because they don't want the additional bandwidth flowing through their fiber and personally, I believe if you buy a 100mb or a 1G internet line from a carrier then it should be yours to use as you wish as long as it remains within the law. This is compounded by the fact that carriers themselves seem to have a tendency to disregard user / privacy agreements and have been happy to sell metadata and location information to any data brokers without ever checking with their customers whether its okay or not.
This is obviously the opinion of someone who has a stake in the game but when it comes to web-scraping, VPN usage, proxies and internet usage in general I tend to find myself believing in a free and open web with any blocks, restrictions or censorship usually being a bad thing.
> from a legal standpoint, the AUPs aren't signed by us; they're signed by the customer and as long as they understand what they're doing through us ensuring we get informed consent, then its their responsibility
Have you consulted legal counsel about this? What you're describing sounds like tortious interference.
> only reason they don't want this activity on their network is because they don't want the additional bandwidth flowing through their fiber
As someone who has a stake in a small ISP: this is not true. I don't want you trashing the reputation of my IPs and getting them banned from the services your customers are scraping. Replacing those IPs comes at a significant cost ($8000-9000 per /24).
You've definitely got an interesting view point and I appreciate your take as a stakeholder.
To address the first point, I had to look up tortious interference haha but after seeing the main elements, I don't think we'd be close to meeting that threshold. Mainly because:
1. Offering a service which is then engaged with by a end user != Convincing/Interrupting/Interfering 2. We don't ever know the internet contracts they've signed + I actually don't know whether AUP prohibit this kind of activity (I don't think its common, at least in the UK, from my knowledge) 3. I think most ISPs would be hard stretched to provide any material damage
To your second point, since you're a smaller ISP I can understand your position somewhat and my original post was more to do with the big players; we have a lot of experience with large ISPs and they tend to be happy to lease IP space / IP transit if the price is right.
As someone that knows the space very well, I also think the risks here are pretty overstated and subnet bans are incredibly rare and usually caused by activity en masse across an entire block. The likelihood that every single one / or most of your customers would be web-scraping with their IP address is pretty much zero. I guess the effect of activity also depends a lot on what activity it is and how the network is ran - we're very strict on the traffic that can go through the network and everything high-risk is blocked i.e government, edu, banking and extra-extra-bad stuff to avoid issues; I concede that if the company running the network is allowing mailing and everything else under the sun it would have a larger effect on stakeholders such as yourself but I think to broadly say web-scraping on a small portion of IPs on an AS ends in the ISP having to purchase new prefixes is a stretch and hypothetical. If you do actually have experience and have had to purchase new subnets in the past because of this stuff then I'd definitely be interested to hear more (tim@pingproxies.com) and I'd be happy to remove any IPs from your AS if they're present in our network.
Cheers, Tim at Ping
Hey, Any experience with running bots for games upon your network, Most of them will block signups/auto ban datacenter ip's at this point, Curious if you might be a valid alternative.
Best to hop on with support@pingproxies.com and explain your use-case. They'll be able to say whether or not we have a service that fits your needs.
Cheers, Tim at Ping
The fact that you're willing to entertain this request is rather telling.
Video game bots aren't illegal and they're barely immoral. I purchased the game, let me play it how I want to, one could argue. Sure it's probably against the ToS, but who reads that anyway right?
Sigh. This argument again?
Modern video games - certainly the ones that people care about running bots for - are multiplayer games. Players using bots actively degrade the game experience for other players.
(Besides, people inquiring about running bots on residential proxies usually aren't in it for their own enjoyment. That sort of commitment typically means they're doing it for profit, e.g. to sell in-game items and/or "boosted" accounts.)
200GB is nothing since 2018 when AT&T mass introduced their 1-gig symmetric fiber. Any single common gigabit link can run 200GB in 15 minutes.
On any gig link, over the course of 6 hours you can transmit a little more than 4TB one way.. which is 40x more.
Too bad AWS didn’t get that memo, 200GB would cost $18 there, and somehow the company in the original post is paying $500 for that bandwidth with whoever their proxy host is.
Haha unfortunately we use residential proxies under the hood to simulate real users (as you'd expect from AI agents), where bandwidth is significantly more expensive!
How does a residential proxy work? Do people rent out their internet connections to commercial services?
Computers getting infected with malware, pre-compromised cheap internet devices from Amazon/Wish.com, and game developers monetizing "free" games by running proxies in the background.
There are usually a few layers of resellers so technically the proxy provider can throw their hands up in the air and say they are unaware of any malicious activity.
The screenshot is from webshare
" 200GB of proxy bandwidth was approximately $500 burned over the course of 6 hours"
The fuck ? So Internet is literally more expensive than buying a drive at amazon, paying for shipping, filling it up putting it on a truck towards a destination anywhere in the world.
Well, one part of the source of the problem is this, where I not even understand all of the words (a bit exaggarating):
> Skyvern is an AI agent that helps companies automate workflows in the browser. We run leverage proxy networks and run headful browser instances in the cloud to facilitate most of our automations.
So you're doing Selenium, just with Cloud, AI and some other buzzwords you found while Googling?
I mean, yeah, bandwidth costs aren't just about bytes, they're about energy, infrastructure, and routing complexity too.
200GB for $500? What cloud is this?
I don't think it's a cloud. It's more likely a residential proxy network, which are typically created by installing malware on users' machines.
The operators of these proxy networks want to avoid detection by both the users whose bandwidth they're stealing, and by the companies whose data is being scraped. So they want to make the bandwidth very expensive. And that expensive bandwidth in turn means that their only clients are dodgy as well. Either people looking to scrape data without consent and monetize it, or outright criminals.
I use one. I run a bot on IRC that extracts the <title> of every link posted (or downloads the image/whatever and extracts Metadata) and announces that to the channel. It has become more and more pointless to run this on a vps. Google/YouTube block the IP range, a lot of websites return the cloudflare security check, Amazon works on some days and doesn't on others... Ever since I proxy via residential proxies it just works. I'm a smooth criminal. :>
So much for the open internet.
You can thank the spammers.
I’m not sure how much of this is due to spammers and how much is due to “growth & engagement” that wants to make sure a human’s time is being wasted.
To stop spammers, you implement measures before posting, not before viewing. Spam is just a minor technical nuisance. It's automated interaction that really makes their executives sweat and shiver.
I feel your pain, but I refuse to cave. Say, 10% of the links fail to load, so what? It is their loss, not mine.
There's many reputable residential proxy networks too, usually there's a lot of vetting involved too as they don't want people running illegal activities though their network.
It's almost a necessity these days to have access to that due to how much datacenter ranges are blocked.
It's kind of surprising that a presumptively legitimate company (and YC-funded startup) would out themselves as buying black market residential proxy bandwidth, isn't it?
Their frontpage also advertises the ability to pass CAPTCHAs, whether by automation or more likely by delegating them to third-world CAPTCHA farms. If that's a major selling point for your automation service then your target market probably ranges from dubious (e.g. data scrapers trying to get around limits) to extremely dubious (e.g. ticket scalpers, spammers, click fraud, etc).
Just because something can be used for sketchy purposes doesn't mean that's the only purpose of it. there are thousands of situations where people are forced to interact with a shitty website 100x per day and the site won't provide an api. Imagine if your job was booking plane tickets all day. United could provide you an API key to do so via an API, but in practice they won't, only some enterprisey travel software company can get that kind of access, for a steep fee. You could build a tool which automatically puts together an itinerary based on rules and books it, through a tool like this. Perhaps a slightly contrived example but I believe things like this definitely happen.
> United could provide you an API key to do so via an API, but in practice they won't, only some enterprisey travel software company can get that kind of access, for a steep fee. You could build a tool which automatically puts together an itinerary based on rules and books it, through a tool like this. Perhaps a slightly contrived example but I believe things like this definitely happen.
And you think that's NOT sketchy?
I'm almost afraid to ask where you think the bar is...
And why is it? A company provides you an API for a "fee" and a free web-based interface, as long as you are agile enough to use it, with some limitations per ip/cookie. You choose the second path and automate it. What's wrong with that? Limits of the free access are the public contract. You're not obliged to play along with someone's "monetary spirit".
And in practice, APIs are often much more PITA than the actual interface, but you can't buy unlimited web automation. Few years ago one of my projects literally OCRed data from an android phone screen because receiving it via API took a couple minutes longer and involved email-like back and forth with polling and id matching after a convoluted authentication that fails every few requests.
It's exactly as sketchy as having a hypothetical robot sit down at a console and type it out. Which, IMO, is not very sketchy at all.
A very common and pro-consumer use for residential proxies is price scraping and price comparisons.
Most businesses don't want to compete on price and are extremely unhappy if you tell people that their competition sells the same stuff but for less, that their "best deal of the month" is actually a price raise, or that they significantly raise toilet paper prices every time there's a natural disaster.
Agreed. Just for reference, one of our most popular use-cases is automating data entry into CRMs without APIs... No one wants to be doing this stuff manually, and automating it has some serious positive QoL impact
We get a lot of requests for bad usage (ie spinning up upvote rings on Reddit) but we don't want to support things like that
> one of our most popular use-cases is automating data entry into CRMs without APIs... No one wants to be doing this stuff manually, and automating it has some serious positive QoL impact
No-one would need captcha automation or residential proxies for a use case like that that's all on the level.
But no one can or needs to use a residential proxy for automating CRM data entry.
Imagine a legitimate travel agency cannot book 100 United tickets a day via methods outlined in business contracts and need to resort to shady practice.
Dude, please provide some real solid evidence to back this up, and perhaps come up with another realistic scenario where bypassing captcha is justified.
> Imagine a legitimate travel agency cannot book 100 United tickets a day
That's the whole point, I never said travel agency, I was thinking a company with travelling consultants.
How TF is it "shady" to purchase and use airfare?
And again, bypassing captcha, say, to purchase tickets isn't evil either, if you are purchasing them for use and not for resale. It would just allow a person to book tickets for 50 people without wasting 6 hours to complete 25 CAPTCHAS and type in my information 25 times.
CAPTCHA is a blunt instrument deployed in an attempt to mitigate abuse, but it has a massive bad side effect that for every heavy user (not just evil users), it requires a human butt to be in a seat somewhere to do mindless busywork that could otherwise be automated. Working around that (sounds like OP agrees to do so on a case by case basis) is not inherently evil. It's as evil (or benign) as whatever you're using it for.
You ever see that video of the women paying a thousand dollars to skip to the front of the release day line to buy one of the first generation iPhones?
Then when she did and the employees told her they limited customers to buying one or two iPhones per person she becomes incredibly flustered. The guy who sold his spot in the line celebrates with a free phone.
What you’re describing is analogous and there’s a reason that went viral on the internet and was reported on in the mainstream, but I won’t spell it out for you.
How long have you been here? It's not surprising at all. HN and YC have not demonstrated an aversion to "uh, greyhat" activity.
If it were 2000, people would be sharing their ad clicking startups.
YC has funded a looooooot of sketchy companies.
Residential proxies are not necessarily "black market".
It's almost never done with the full understanding of the person providing the proxy, doesn't matter if they get promised some change, their browser addons betray them or they install bundleware/adware.
I'd say it has about the same moral standing as a payday loan.
There’s other ways for example through mislabeled “residential” blocks, or “residential” proxies that are sold by ISPs to vendors.
Here more on "free VPNs”
https://www.kaspersky.com/blog/what-is-wrong-with-free-vpn-s...
Usually such proxy networks are outright criminal (even if users are not).
It’s not necessarily malware. There are services that are pretty upfront and pay cash money for residential US bandwidth. That said, naive people might be surprised when their IP starts getting blocked.
e.g. https://www.honeygain.com/ (something like 100GB = $20).
>That said, naive people might be surprised when their IP starts getting blocked.
Or law enforcement shows up at their door because their IP is involved in a bunch of illegal stuff.
how does expensive bandwidth equate to dodgy clients? There are lot's of valid use cases for scraping data, and it's legal to scrape publicly available data, even if the websites hosting it try to block it (try a curl request to reddit, for example)
>>>and it's legal to scrape publicly available data, even if the websites hosting it try to block it
Is that something that's been fully decided? https://en.wikipedia.org/wiki/Craigslist_Inc._v._3Taps_Inc. is the most relevant case I'm aware of, and it suggests it might actually be illegal (if you know you've been blocked, at least).
https://techcrunch.com/2024/01/24/court-rules-in-favor-of-a-...
This is another interesting example where it was allowed
Aren't there also some suspiciously cheap VPNs that do that in the background?
An employee of one proxy network describes that exact business model here:
https://news.ycombinator.com/item?id=41597315
Yes
Yeah, the author confirmed it in this thread actually:
https://news.ycombinator.com/item?id=41594713
Residential proxy service
https://smartproxy.com/proxies/residential-proxies/pricing
(may not be this service, but this is an example, and the price is consistent with their larger commitments)
Absolutely wild. A normal price for bandwidth before volume discounts is 1c/GB, or 10 bucks per TB
They're in the business of scraping/botting sites that don't want to be scraped/botted, and bandwidth that looks "legit" comes at a premium.
I downloaded world of Warcraft the other day, 100GB, took less than 3 hours and you can be sure it didn’t cost blizzard $0.05.
Blizzard quite famously used BitTorrent to save bandwidth, dunno if they still do:
https://wowpedia.fandom.com/wiki/Blizzard_Downloader
Looks like webshare from the screenshot
api.skyvern.com is a CNAME to an EC2 ALB, but even using a NAT Gateway ($$$) I can't make more than $1/GB add up.
The discussion linked in the post is from 2022, and the corresponding issue has already been fixed:
https://issues.chromium.org/issues/40220332
I wonder if there is a more recent bug related to this?
I think that Chrome is still doing a lot of downloads. They're just no longer showing them to the user.
And uploads.
I would have liked to see a bit more of 5 Whys here. It seems like a consistent lesson that startups have to learn over and over is how to manage external dependencies, and particularly the dangers of having Google as a dependency. This is new Chrom(e|ium) behavior, and it has a real cost, both for this company and for users, which may or may not be worth the ROI, but this is what happens when you have a large scale external dependency: stuff moves without your knowledge, consent, or control.
Instead of Always. Be. Closing. it should be Always. Be. Mitigating. Dependencies. for startups.
This is a great callout.
We had an internal discussion about how to manage dependencies effectively, and we made the decision accept the risk that comes with blindly relying on Chrome for now, instead of investing heavily in mitigating that risk today.
The main motivator was for us to continue moving fast, and accept that we have a few hard dependencies in our business.
The goal is to find product market fit, then allocate time to de-risk some of these hard dependencies. If we fail to find product market fit, this may not matter at all
I think that's a fair strategy. Strong PMF generally overcomes weak execution, the challenge is that when you have hard dependencies on entities like Google or Apple it can easily become existential. Even if you choose to move forward with this dependency you should establish guard rails within your system to ensure you catch shifts faster that may be impactful and have a plan for mitigation. For instance, you should identify key points of integration and possible alternatives even if you choose not to migrate now, so that a future migration is better understood and can be discussed intelligently in the heat of the moment. Even internal documentation can assist as a mitigation for dependency risk.
Yeah exactly. One action item from this is that we need to add anomaly detection to our proxy usage metrics so we can catch this in 15 minutes instead of 6 hours :)
What infrastructure is this using? Bandwidth seems pretty pricy
No kidding. AWS's notoriously expensive data transfer is only $0.09/GB. Who's charging $2.50/GB? Are they running on a cellular SIM with no data plan?
Residential rotating proxy providers charge very high rates for data, on the order of $1 - $10 per GB. (These providers often do run their proxies through the cellular network, actually.)
Is this something where end users can get paid for doing nothing other than proxying some traffic through their ISP?
The end user typically has their device compromised by using free apps where the developers were bribed $$$ to add the proxy "SDK". The botnet operator then rents out the bandwidth at exorbitant rates to anyone who will pay for it.
Chrome extensions are also a huge source of this, they look for extensions with a large install base and then make an offer to buy it to turn all the users into proxies.
end users install shady VPN apps/extensions to watch pirated content, and become part of residential proxy mesh/botnet
That's probably where some of the proxies come from.
Yes. Google “honeygain”
Sure, if you want a whole bunch of legitimately malicious traffic to be attributed to your internet account.
If by “some” traffic you mean botnets, sneaker and ticket scalpers, scammers, content scrapers, credential stuffers … generally scummy stuff, sure.
Based on this blog post I would not do any business with Skyvern, if they indeed do business with this underworld of bottom feeders.
Sounds like they are running a web scarping business -- so maybe? Using a cellular connection would be one way to help not get immediately capcha-ed by every site using cloudflare.
They should really setup their scraper and (exfil the data) via regular connections.
Please, gigabyte isn't a unit of bandwidth.
Bandwidth is measured in data/time
Tell that to every single ISP and Cell provider.
Data volume makes more sense than bandwidth: https://www.telekom.de/prepaid-aktivierung/en/start
Their explanation of bandwidth looks fine as well: https://dih.telekom.com/en/glossary/bandwidth
Skyvern is a great name, very evocative. Typical arrogant Google, downloading trash to the user without consent.
You guys should look into some unlimited bandwidth options. I use https://scrapingfish.com/unlimited
This is really cool! I'll check it out :)
You're clearly associated with scrapingfish and not just a customer, your entire comment history is just shilling for them.
"We run leverage proxy networks and run headful browser instances"
Um...say what? I'm pretty broadly based in IT, and I have no idea what that means.
Haha, apologies for the language!
We use residential proxy networks when running Skyvern to help simulate real human behaviour (because that's what Skyvern is trying to do).
We run headful browser instances (meaning a real chrome instance running with a real viewport) for the same cause!
Honestly given many of these stories, $500 seems to be getting off pretty lightly.
It’s still absurd to me that many (most?) of these hosting/bandwidth providers don’t seems to allow automatic cut offs and such
It definitely could have been much worse. We burned through our monthly allocation in 6 hours HAHA, I'm grateful that our allocation wasn't something like 10TB
yeah, that could have been "exciting" :-O
Blocking Google from downloading anything onto your computer without consent is always a good idea.
We were pretty careful about what we were blocking here -- had the exact same concern. Hopefully it doesn't come back to bite us in the future (new blogpost incoming?)
Especially if you're using expensive bandwidth from botnets.
[dead]
>200GB of proxy bandwidth
Gigabyte is a measure of information.
Bandwidth is information transmitted over time.
you shouldn’t be paying by the terabyte. Colocate and just pay for the maximum throughout. Far better rates
doesn't work when the sites you're scraping block the IPs/range of your server. They're using a proxy botnet that costs a premium
makes sense but you shouldn't be paying a premium for a non-premium service. IP blocks and bandwidth have low unit cost at scale.