← Back to context

Comment by solatic

12 days ago

If you're writing a CRUD app and mocking your database calls instead of just starting an actual Postgres instance before running the tests, you're probably using mocking wrong.

If you're writing a custom frontend for GitHub using the GitHub API and don't bother writing a decent set of mocks for how you expect the GitHub API to behave, your app will quickly require either full manual QA at best or become untestable at worst. Some APIs are very stable, and testing against the API itself can hit rate limiting, bans, and other anti-abuse mechanisms that introduce all kinds of instability to your test suite.

Use the right tools to solve your problems.

> If you're writing a custom frontend for GitHub using the GitHub API and don't bother writing a decent set of mocks for how you expect the GitHub API to behave, your app will quickly require either full manual QA at best or become untestable at worst. Some APIs are very stable, and testing against the API itself can hit rate limiting, bans, and other anti-abuse mechanisms that introduce all kinds of instability to your test suite.

I've been doing E2E testing using 3rd-party APIs for a decade now, and this has yet to be a significant problem. The majority of my APIs had a dedicated sandbox environment to avoid "rate limiting, bans, and other anti-abuse mechanisms". The remainder were simple enough that the provider didn't care about users exploring on the live API, and were usually read-only as well.

Did I run into the occasional flaky failure, or API stability issues? Sure. But it was very rare and easy to workaround. It never devolved into becoming "untestable" or "full manual QA"

My other teams that relied on mocks suffered from far worse problems - a ton of time being spent on manual-QA, and bugs that leaked into production, because of mock-reality mismatches.

  • There are plenty of libraries out there, like VCR, that can set up a test and then save the response for future test runs. You don't really have to renew them that often either.

    That was always the go-to for me when testing against 3rd party services, especially because the tests would then survive the offboarding of the engineer who set them up with their personal credentials.

    If your test suite relies on live Github PATs or user-specific OAuth access tokens, then you can either figure out how to manage some kind of service account with a 'bot' user, or live with things breaking every time someone leaves the org.

    Services that incur a per-request charge, or consume account credits, are another problem. Especially if they don't have sandboxes.

  • Outside of the payments industry I haven't encountered many sandbox APIs that don't have rate-limits, what are some good ones you've seen of those?

  • I have a custom frontend for GitHub using the GitHub API (https://github.com/fastai/ghapi/) and don't use mocks - I test using the real API. I've had very occasional, but not enough to ever cause any real issues.

    I don't find mocks for this kind of thing very helpful, because what you're really testing for are things like changes to how an API changes over time -- you need real API calls to see this.

  • Yeah, even if there's no sandbox mode, a separate sandbox account will usually do. Sometimes this catches misuse that would've caused rate-limiting in prod. And if a service makes this hard, maybe you shouldn't use it in prod either.

When you write tests with mocks you almost always at some point end up with tests that test your mocks lol, and tests that test that you wrote the tests you think you wrote -- not the software itself.

I’ve never been thrilled by tests that rely on mocking — it usually means you need to re-express your module interface boundary.

Mocks for me fall into the class of software I affectionately call “load-bearing paint.” It’s basically universally the wrong tool for any given job but that really doesn’t stop people. Putting in a data class or similar model object and a delegate is usually sufficient and a much better tool.

  • > It’s basically universally the wrong tool for any given job but that really doesn’t stop people.

    I find mocks useful for testing conditions that are on the periphery and would be a decent amount of trouble to set up. For instance, if I have a REST controller that has a catch all for exceptions that maps everything to a 500 response, I want a test that will cause the DAO layer to throw an exception and test that the rest of the "real stack" will do the translation correctly. A mock is the easiest way to accomplish that.

    • I agree. I will mock 3rd party APIs sometimes so I can test that my system correctly handles failures. For example, what if I get a 500 response from the API? With my mock I can easily make that happen. If I was using the actual API, I would have no way of forcing a 500 to happen.

  • I agree that if you need to write mocks, it's likely that your interfaces are poorly defined. This is one of the claimed benefits of test driven development - writing the tests first forces you to design the code in a way that cleanly separates modules so they can be tested.

    • You have to mock/fake when modules call dependencies.

      Your way means you only ever have siblings. With an orchestrator pulling results out of one module and pushing it into another.

      7 replies →

Re. postgres, this is actually something I have always struggled with, so would love to learn how others do it.

I’ve only ever worked in very small teams, where we didn’t really have the resources to maintain nice developer experiences and testing infrastructure. Even just maintaining representative testing data to seed a test DB as schemas (rapidly) evolve has been hard.

So how do you

- operate this? Do you spin up a new postgres DB for each unit test?

- maintain this, eg have good, representative testing data lying around?

  • Docker Compose is a super easy way to run Postgres, Redis, etc. alongside your tests, and most CI platforms can either use a Compose file directly or have a similar way of running service containers alongside your tests. Example: https://docs.github.com/en/actions/using-containerized-servi...

    Typically you'd keep the database container itself alive, and you would run the schema migrations once at startup. Then your test runner would apply fixtures for each test class, which should set up and tear down any data they need to run or that they create while running. Restarting the database server between each test can be very slow.

    The test data is a harder problem to solve. For unit tests, you should probably be creating specific test data for each "unit" and cleaning up in between each test using whatever "fixture" mechanism your test runner supports. However, this can get really nasty if there's a lot of dependencies between your tables. (That in and of itself may be a sign of something wrong, but sometimes you can't avoid it or prioritize changing it.)

    You can attempt to anonymize production data, but obviously that can go very wrong. You can also try to create some data by using the app in a dev environment and then use a dump of that database in your tests. However, that's going to be very fragile, and if you really need hundreds of tables to be populated to run one test, you've got some big problems to fix.

    Property-based testing is an interesting alternative, where you basically generate random data subject to some constraints and run your tests repeatedly until you've covered a representative subset of the range of possible values. But this can be complicated to set up, and if your tests aren't fast, your tests can take a very long time to run.

    I think at the end of the day, the best thing you can do is decouple the components of your application as much as possible so you can test each one without needing giant, complicated test data.

  • We use TestContainers for this, and it's superb. It's a full instance of the DB, started for each unit test, running inside a docker container. TC does smart things to make sure it doesn't slow the suite too much.

    we have the same strategy for testing against Kafka., etc.

    where we care about data, we seed the db with data for a specific group of tests. Otherwise, we just nuke the db between each test.

    Prior to doing this, we'd use in-memory db for tests, and a real db for runtime, using JPA / Hibernate to make things transferrable. But this was leaky, and some things would pass in tests then fail at runtime (or vice versa)

    TestContainers has been so much better, as we're running against a real version of the database, so much smaller chance of test and runtime diverging.

  • TestContainers, or just assume there is a postgres running locally.

    > - maintain this, eg have good, representative testing data lying around?

    This can be tricky, but usually my advice is to never even being trying to do write seed data in the database unless its very static. It just gets annoying to maintain and will often break. Try to work out a clean way to setup state in your tests using code, and do not rely on magic auto increment ids. Some of the more effective ways I have found is to f.ex. have every test create a fresh customer, then the test does work on that customer. Avoid tests assuming that the first object you create will get id == 1, makes it very annoying to maintain.

    • This is a big one. A term of art for this is "General Fixture", and for xUnit type testing I consider it to be an anti-pattern.

      There's times when a big test fixture can provide value, but it's very context dependent and almost never for smaller/micro tests.

  • > operate this? Do you spin up a new postgres DB for each unit test?

    Generally I've seen a new database (schema in other dbs?) in postgres that is for testing, i.e "development_test" vs "development". The big thing is to wrap each of your tests in a transaction which gets rolled back after each test.

    > maintain this, eg have good, representative testing data lying around

    This is much harder. Maintaining good seed data - data that covers all the edge cases - is a large amount of work. It's generally easier to leave it up to each test to setup data specific to their test case, generalizing that data when possible (i.e if you're testing login endpoints, you have all your login test cases inherit from some logic specific data setup, and they can tweak as needed from there). You will end up with duplicated test setup logic. It's not that bad, and often you don't really want to DRY this data anyways.

    That being said, if you have the time and resources to maintain seed data it's absolutely a better way to go about it. It's also beneficial outside of tests.

    • > Generally I've seen a new database (schema in other dbs?) in postgres that is for testing, i.e "development_test" vs "development".

      Every place I've ever worked which tried this has managed to get a production database deleted by somebody running tests.

      9 replies →

  • Create one DB for the whole test suite, and then re-instantiate tables/schemas on every unit test.

  • I have tried various approaches and here's what worked best, assuming that there is some natural way to partition most of the data (e.g. per account):

    1. Init the DB with some "default" data - configuration, lookup tables, etc

    2. Each test in the test suite owns its data. It creates a new account and inserts new records only for that account. It can for example create users on this account, new entities, etc. It can run multiple transactions, can do rollbacks if needed. It is important to only touch the account(s) created by the test and to avoid touching the initial configuration. There's no need to clean up the data after the test finishes. These tests can run concurrently.

    3. Create a separate integration test suite which runs sequentially and can do anything with the database. Running sequentially means that these tests can do anything - e.g. test cross-account functionality, global config changes or data migrations. In practice there aren't that many of those, most tests can be scoped to an account. These tests have to clean up after themselves so the next one starts in a good state.

    Other approaches had tons of issues. For example if each test is wrapped with a transaction which is later rolled back then testing is very limited - tests cannot use transactions on their own. Savepoints have similar issue.

  • At several places I worked at, we would snapshot the production DB, and use that for testing. You cannot get more ”real-world“ than that. We would also record real requests, and replay them (optionally at increased speed) for load testing.

    Obviously, there are some caveats, e.g.:

    * While this approach works perfectly for some tests (load testing, performance testing, …), it does not work for others (e.g. unit testing).

    * You have to be careful about PII, and sanitize your data.

  • I run a replicated copy of the production database on top of zfs and snapshot it before starting tests. PostgreSQL takes a few seconds to start on the snapshot and then you're off to the races with real production data. When the test suite finishes, the snapshot is discarded. This also ensures that migrations apply correctly to the production db before an actual prod is used.

  • I feel that trying to maintain "representative testing data" is generally not a good idea; set up the data you want/need in the test instead.

    Just run PostgreSQL on your local machine, connect to that, setup a new schema for every test (fairly cheap-ish) inside a test database.

      def Test1:
        setupdb()
        obj1 = createObj1()
        obj2 = createObj2()
        have = doStuff(obj1, obj2)
        if have != want: ...
    
      def Test1:
        setupdb()
        obj = createObj1()
        have = doOtherStuff(obj1)
        if have != want: ...
    

    Creating reasonably scoped reasonably contained "unit-y tests" like this means you will actually be able to understand what is going on. Too often have I seen people set up huge wads of "mock data" and then run all their tests on this. Then Test1 does something Test2 doesn't expect and you're screwed. Or worse: Test42 does something that screws Test185. Good luck with that. Or you introduce a regression somewhere and now you've got tons of data to understand.

    • Yeah the keys to make it all work are

      1. It's easy to create the objects you need

      2. Your creation functions are well tested so that the rest of your tests can rely on them.

      If you have spotty coverage or just poorly defined creation semantics, or it's a bunch of calls to functions all over the place just to set up your test data, then this doesn't work.

      But the solution typically isn't "write a bunch of JSON mock test data", it's to solve those problems.

  • The ideal experience is that you anonymize prod and sync it locally. Whether it's for testing or debugging, it's the only way to get representative data.

    When you write mock data, you almost always write "happy path" data that usually just works. But prod data is messy and chaotic which is really hard to replicate manually.

    This is actually exactly what we do at Neosync (https://github.com/nucleuscloud/neosync). We help you anonymize your prod data and then sync it across environments. You can also generate synthetic data as well. We take care of all of the orchestration. And Neosync is open source.

    (for transparency: I'm one of the co-founders)

  • I made https://github.com/boustrophedon/pgtemp to solve this for myself

    • i dont understand why everyone just doesn't do this unless they are working with really large volumes of test data. it literally takes a fraction of a second to mkdir, call pginit, and open a postgres socket.

      idk if you've solved this, but PG doesn't like to bind to 0, so you have to manage ports. And I've had issues with processes sticking around if the test driver has crashed (I dont currently, but i'm turning off setsid in postgres).

      1 reply →

  • Great answers below (test containers for example).

    However, it’s not always possible.

    For example:

    - you use oracle db (takes minutes to start, license, hope the containers run on ARM fine, etc.) - sometimes an in memory fake is just much faster, and can be an official db on its own for people to try the product - your storage might be only available through a library by a third party provider that is not available locally.

    • Could you leave a test Oracle DB running all the time and clear it between tests? I do this with Postgres.

  • I've been on teams where we've done this (very successfully in my opinion!) by creating helper code that automates creating a separate Postgres schema for each test, running all migrations, then running your test function before tearing it all down again. This all runs on CI/CD and developer machines, no credentials to any actual environments.

    A major benefit of doing separate schemas for each test is that you can run them in parallel. In my experience, unless you have a metric ton of migrations to run for each test, the fact that your database tests can now run in parallel makes up (by a lot!) for the time you have to spend running the migrations for each test.

    EDIT: usually we also make utilities to generate entities with random values, so that it's easy to make a test that e.g. tests that when you search for 5 entities among a set of 50, you only get the 5 that you know happen to match the search criteria.

    • Running all migrations before every tests can take you a surprisingly long way.

      Once that gets a bit too slow, running migrations once before every suite and then deleting all data before each test works really well. It's pretty easy to make the deleting dynamic by querying the names of all tables and constructing one statement to clear the data, which avoids referential integrity issues. Surprisingly, `TRUNCATE` is measurably slower than `DELETE FROM`.

      Another nice touch is that turning off `fsync` in postgres makes it noticeably faster, while maintaining all transactional semantics.

  • My integration tests expect the db to run. If I need fixture data, those are sql and read in at the start of the suite. Each test uses its own temp db/tables and/or clears potentially old data before running.

  • Firstly, I'm seeing all these answers that say spin up a new server, and I have to wonder "WTF?"

    No need to spin up a new server, not in a container, not in a new directory, not at all. It's pointless busywork with too many extra points of failure.

    Nothing is stopping you using an existing server and creating a new DB, which takes about 1/100th the time that starting up a new server (whether in Docker or otherwise) takes.

    Secondly, I don't actually do unit-testing on the database layer - there's little point in it. I test workflows against the database, not units!

    What I do is create multiple 'packages' of tests, each with multiple tests. A single 'package' creates a temp db, runs its tests sequentially and then drops the temp db. Each package will setup itself with SQL statements.

    This lets the tests perform tests of actual workflows, instead of testing in isolation that an object can be (de)serialised. IOW, I can test that the sequence of `addUser(); setProfilePassword(); signIn(); viewInfo(); signOut();` work as expected, and that `removeUser(); signIn();` fail with the correct error.

  • > So how do you > ... > - maintain this, eg have good, representative testing data lying around?

    This one can be very easy, depending on the kind of data you're working with. Many places shall simply dump a part (or the whole if it's not too big) of the production DB into dev and pre-prod environments.

    Now if there are sensitive, non-encrypted, data that even the devs cannot see, than it can get tricky (but then arguably they cannot see the logs in the clear either, etc.).

    But yeah: a recent dump of the prod DB is good, representative data.

    I've worked at places where pre-prod had a daily dump of the prod DB. Simple.

Also, don't create any interfaces you wouldn't mock. I've seen too many people waste months creating some kind of "database wrapper."

> If you're writing a CRUD app and mocking your database calls instead of just starting an actual Postgres instance before running the tests,

Actually that's wrong too. The production database will be different than the "testing Postgres instance", leading to bugs.

It turns out that whatever testing solution you use, if it's not the actual production instance and you're not using real production data, there will be bugs. Even then there's still bugs.

This is the simple truth: you can't catch all the bugs. Just put in Good Enough testing for what you're doing and what you need, and get on with life. Otherwise you will spend 99% of your time just on testing.

  • > The production database will be different than the "testing Postgres instance", leading to bugs.

    It never happened to me to be honest. This reads an argument for "if you can’t do perfect, just do it badly" but it’s nonsense. Running tests against a local Postgres instance with the same major.minor version and same extensions as your prod instance WILL work.

    And testing your storage layer against the database is probably the most reliable safety net you can add to an app.

    • > Running tests against a local Postgres instance with the same major.minor version and same extensions as your prod instance WILL work.

      A team I worked with recently said the same thing. But, as I predicted, they ran into bugs because the CloudSQL Postgres was different than their Dockerized Postgres, even though it was the same core version.

      There will always be testing problems you can't anticipate. Especially with systems that are not your own code. Just be ready to adapt your testing when it doesn't work as expected, and don't invest too much in the testing if it's not worth it.

      3 replies →

One of the nice things about the .NET ORM EntityFramework is that you can swap a mocked in-memory database for your prod DB with dependency injection, so without modifying your code at all and theoretically without affecting the behavior of the ORM. Which is to say, you're right, it's about using the right tools. Those tools of course vary by ecosystem and so in some cases mocking the database is in fact the correct decision.

  • Probably the single most obnoxious production defect I ever found related to a database would never have made it into production if we had been using a real database instead of a test double. It happened because the test double failed to replicate a key detail in the database's transaction isolation rules.

    After figuring it out, I swapped us over to running all the tests that hit the database against the real database, in a testcontainer, with a RAM disk for minimizing query latency. It was about a day's worth of work, and turned up a few other bugs that hadn't bit us in production yet, too. Also sailing past our test suite because the test double failed to accurately replicate the behavior in question.

    Total time to run CI went up by about 10 seconds. (For local development you could chop that way down by not starting a fresh server instance for every test run.) Given how many person-hours we spent on diagnosing, resolving, and cleaning up after just that first defect, I estimated the nominally slower non-mocked tests are still a net time saver if amortized over anything less than about 50,000 CI runs, and even then we should probably only count the ones where an engineer is actually blocking on waiting for the tests to complete.

    That said, there was a time when I thought test doubles for databases was the most practical option because testing against real databases while maintaining test isolation was an unholy PITA. But that time was 5 or 6 years ago, before I had really learned how to use Docker properly.

    • I simply don't think that I will ever be able to come up with anything even vaguely as comprehensive as the test coverage that Microsoft already has for ensuring their ORM behaves consistently across database providers. In my over 10 years of using EF, I have never once encountered a database bug like you describe. If I were to discover such a bug (which I'll admit does occasionally happen even though it hasn't happened to me), it would be easier and better by far to submit an issue to the EF team and let them figure out a fix (including the appropriate tests) than it would be to rework my own test infrastructure. I am not in the business of developing requirements or code for databases, and building an elaborate test model for what I consider the essential requirements for a database would be a distraction from developing code that is more valuable to the business.

      The same logic does not apply to all ORMs, of course, which do not all benefit from the same quality of professional dev support that EF receives from MS. But that's my main point from above: the correct design decision depends on the context. For services written in other languages with other ORMs or raw SQL, I absolutely will spin up a full Postgres test container because it is indeed trivial (have one running in the background on my laptop right now in fact). It just isn't necessary in the specific context of EntityFramework code.

      3 replies →

    • Using an in-memory database does not increase my confidence that in my software either. I also started using dockerised dependencies in tests a couple of years ago.

      Can you please explain what you did with a RAM disk to speed them up?

      1 reply →

> Some APIs are very stable, and testing against the API itself can hit rate limiting, bans, and other anti-abuse mechanisms that introduce all kinds of instability to your test suite.

Those rate limits, bans, and other anti-abuse mechanisms are things that would be good to uncover and account for during tests. Better for the test suite to detect those potential failures than the production deployment :)

And if you have to mock, at least try to have somebody else write the mock. Testing your understanding of GitHub's API against your understanding of GitHub's API isn't useful. Testing your interpretation of the API behavior against somebody else's interpretation provides a lot more value, even if it isn't nearly as good as testing against the actual API.

Clearly mocking DB is a footgun and it's not that hard to setup e2e test. Use TestContainer or Docker on a random port, run your API on a random port.

Every tests seeds all the data needed to run (user, org, token), it requires an initial setup but then you just reuse it everywhere, and voila. No side effects, no mock to maintain, it also test your auth and permissions, almost 1:1 with prod.

  • > No side effects, no mock to maintain, it also test your auth and permissions, almost 1:1 with prod.

    Can also be used to test version updates of your DB.