← Back to context

Comment by AnthonyMouse

4 years ago

> Hardware is cheap. People are expensive.

Except that you still need the people, because most of the labor isn't putting the hardware in the rack, it's managing the software which you have to do regardless of where the hardware is.

> Besides that, procuring resources with your cloud provider is simply a matter of writing a yaml file.

That is no different than it is locally.

> Not to mention the lack of an upfront investments and only paying for your resources you need instead of having hardware that you are spending money on because you have to have enough hardware to handle peak load.

But hardware is cheap, remember? And most companies don't actually have large load variations.

> But can you run a data center in multiple regions?

Obviously yes. Any company of non-trivial size would have multiple sites and could locate a host at more than one. This doesn't even necessarily raise the price, because you already need enough machines to provide redundancy, so locating them at different sites doesn't even require additional hardware, only locating some of the existing hardware at other sites.

This is also mostly overrated for companies smaller than that, because cloud providers have had company-wide outages at a frequency not all that much higher than site-wide outages for sites that have a reasonable level of redundancy.

> Besides that any cloud provider offers more than just a bunch of VMs. AWS alone has 260 services with an entire team of people keeping them patched and optimized.

This is only relevant if you're using 260 different services and not just a bunch of VMs, and plenty of companies are using just a bunch of VMs.

As a manager of a team of application product developers I can tell you, the headcount cost of ops teams & the time cost of taking people whose job shouldn’t involve vm provisioning overhead but nonetheless does are both huge compared to cloud services. In cloud tooling, my team of people all with zero experience doing vm provisioning can get production systems up, add logging, add alerting, add networking, etc., all very easily or with just low touch overhead from teams that manage best practices or compliance. Creating the same internal developer tool experience with data centers is SO expensive and requires a major headcount investment.

  • It's always shocking to see how inefficiently some companies are operated.

    The things you're describing should take a single individual a matter of seconds for a system which is already in operation, and a one-time cost of a few hours to set up at the outset (i.e. once or twice a decade). If it takes significantly longer to do locally than it takes to input into the cloud provider's interface, something's not right.

    I can tell you where most companies go wrong here though. It's in excessive specialization. If you put separate people in charge of provisioning, networking, logging, etc. then you create a ton of friction to do anything because you need five different humans to touch it and they all have to coordinate. One person can do all of those things, as you've learned when one person does it interacting with the cloud providers. And when one person is doing all of it, it takes only seconds to do.

    • I listed all of the services we used across five environments and most across three availability zones. So one person was going to manage what on prem would be roughly 200 VMs/services and make sure they stay patched, the OS stays updated? Locally, a lot of those services would run in a cluster for availability.

      There is no company on earth that has one person managing an on prem implementation of that level of complexity.

      One person does it in a cloud environment because they aren’t managing hardware, patching, etc.

      3 replies →

For context. My first exposure to the cloud was at my last company of 100 employees. We aggregated publicly available (ie no PII) health care provider data from all 50 states and government agencies as well as various disease/health dictionaries and we combined it with data sent to us from large health systems.

These are the services we used.

Infrastructure

- Route 53 (DNS)

- SQS/SNS (messaging)

- Active Directory.

- Cognito (SAML/SSO for our customers)

- Parameter Store/DynamoDB (configuration)

- CloudWatch (logging, monitoring, alerts, scheduling)

- Step functions (orchestration)

- Kinesis (stream processing). We were just introducing this when I left. I’m not sure what they were using it for.

CI/CD

We used GitHub for source control.

- CodePipeline (CI/CD orchestration)

- CodeBuild (Serverless builds. It would spin up a Windows or Linux Docker container and basically run PowerShell or Bash commands)

- self hosted OctopusDeploy server.

Data Storage

- S3 (Object/File storage)

- Redshift (OLAP database)

- Aurora/MySqL (OLTP RDMS). When we had large indexing to do to ELasticSearch, Read Replicas would autoscale.

- ElasticSearch

- Redis

Data Processing

- Athena (Serverless Apache Presto processing against S3)

- Glue (Serverless PySpark environment)

Compute

- EC2 (Pet VMs and one autoscaling group of VMs to process data as it came in from clients. It ran a legacy Windows process)

- ECS/Fargate (Serverless Docker cluster)

- Lambda (for processes where we needed to scale from 0 to $alot for backend processes)

- Workspaces (Windows VMs hosted in the US as Dev machines for our Indian Developers who didn’t want to deal with the latency.)

- Level 7 load balancers

Front end

- S3 (hosted static assets like html, JS, CSS. You can serve S3 content as a website.)

- CloudFront (CDN)

- WAF (Web Application Firewall)

All of the above infrastructure was duplicated for five different environments (DEV, QAT, UAT, Stage, Prod). In Prod, where needed, infrastructure was duplicated in multiple available zones (not regions).

Where applicable, backups were automated.

We had two full time operations people. The rest was maintained by developers. ——- as far as the rest.

> [Procuring resources] is no different than it is locally.

I can go from no infrastructure to everything I just named in a matter of hours locally? I can set up a multi availability zone Mysql database with automated backups just by running a yaml file locally and then turn it off when not needed?

  • Most of what you're listing are Layer 7 services. The time cost there is in the configuration. You can put Active Directory in the cloud, but it's still going to be Active Directory, i.e. a massively complicated proprietary framework that touches every Windows system in your network like an octopus.

    And some of those things actually make sense. You can't really locally host a CDN, can you? If you need a big amount of compute for an hour and then never again, it doesn't make much sense to buy hardware for that.

    But the point isn't that it never makes sense to put anything in the cloud at all. It's that companies regularly overuse it as some kind of buzzword panacea when there are only a specific set of things that it's actually good for.

    • It’s not just “configuration”. There is also the issue of continuous monitoring and upkeep. Not to mention someone has to worry about servers going down, hard drives going bad, backups. Would any one person know how to configure and maintain everything above?

      I’m a developer who happens to have AWS in my toolbelt. I could set all that up by myself. In the the two years that I worked there, we never had an issue with any of it.

      How much in house expertise would we have had to hire to manage everything that we used?

      2 replies →