FastAPI vs. Express.js vs. Flask vs. Nest.js Benchmark
I wanted to verify FastAPI’s claims of having performance on par with Node.js. So I decided to conduct a benchmark test. For this, I used wrk, an HTTP benchmarking tool.
I also wanted to test it with a call to an endpoint that makes a call to a PostgreSQL database in order to simulate a more “realistic” scenario. In each case, I created an endpoint that returns one hundred rows of data from the database.
I tested the following combinations:
- FastAPI + psycopg2
- FastAPI + SQLModel
- Flask + psycopg2 + flask run
- Flask + psycopg2 + gunicorn
- Express.js + pg
- Nest.js + Prisma
- Flask + psycopg2 + gunicorn (4 workers)
- FastAPI + psycopg2 + gunicorn (4 workers)
- FastAPI + SQLModel + gunicorn (4 workers)
The code for this test can be found here: https://github.com/travisluong/python-vs-nodejs-benchmark.
Here are the results:
FastAPI + psycopg2 + uvicorn
$ uvicorn fast_psycopg:app
$ wrk http://localhost:8000
Running 10s test @ http://localhost:8000
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 32.35ms 3.63ms 50.66ms 90.42%
Req/Sec 154.76 13.60 202.00 78.00%
3110 requests in 10.10s, 23.79MB read
Requests/sec: 308.01
Transfer/sec: 2.36MB
FastAPI + SQLModel + uvicorn
$ uvicorn fast_sqlmodel:app
$ wrk http://localhost:8000
Running 10s test @ http://localhost:8000
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 54.50ms 9.41ms 99.36ms 78.56%
Req/Sec 91.89 16.03 130.00 75.38%
1842 requests in 10.10s, 11.28MB read
Requests/sec: 182.45
Transfer/sec: 1.12MB
Flask + psycopg2 + flask run
$ FLASK_APP=flask_psycopg flask run
$ wrk http://localhost:5000
Running 10s test @ http://localhost:5000
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 14.06ms 4.48ms 42.07ms 73.73%
Req/Sec 354.51 19.96 404.00 70.50%
7070 requests in 10.01s, 37.77MB read
Non-2xx or 3xx responses: 2501
Requests/sec: 705.95
Transfer/sec: 3.77MB
Flask + psycopg2 + gunicorn (1 worker)
$ gunicorn -w 1 --bind 0.0.0.0:5000 flask_psycopg:app
$ wrk http://localhost:5000
Running 10s test @ http://localhost:5000
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 18.50ms 743.07us 22.31ms 84.33%
Req/Sec 269.51 7.33 287.00 55.00%
5386 requests in 10.04s, 46.47MB read
Requests/sec: 536.65
Transfer/sec: 4.63MB
Express.js + pg
$ node express_pg.js
$ wrk http://localhost:3000
Running 10s test @ http://localhost:3000
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 5.19ms 1.04ms 20.60ms 89.87%
Req/Sec 0.97k 89.72 1.08k 66.34%
19521 requests in 10.10s, 151.39MB read
Requests/sec: 1931.99
Transfer/sec: 14.98MB
Nest.js + Prisma
$ npm start
$ wrk http://localhost:3000
Running 10s test @ http://localhost:3000/feed
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 8.49ms 2.37ms 37.77ms 84.94%
Req/Sec 594.97 91.32 720.00 65.00%
11860 requests in 10.02s, 91.98MB read
Requests/sec: 1184.11
Transfer/sec: 9.18MB
Flask + psycopg2 + gunicorn (4 workers)
$ gunicorn -w 4 --bind 0.0.0.0:5000 flask_psycopg:app
$ wrk http://localhost:5000
Running 10s test @ http://localhost:5000
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 6.66ms 1.71ms 26.50ms 90.09%
Req/Sec 743.20 60.59 0.89k 65.00%
14814 requests in 10.02s, 127.81MB read
Requests/sec: 1478.02
Transfer/sec: 12.75MB
FastAPI + psycopg2 + gunicorn (4 workers)
$ gunicorn -w 4 -k uvicorn.workers.UvicornWorker fast_psycopg:app
$ wrk http://localhost:8000
Running 10s test @ http://localhost:8000
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 10.86ms 7.75ms 39.84ms 78.87%
Req/Sec 499.02 291.50 0.91k 55.50%
9962 requests in 10.07s, 76.19MB read
Requests/sec: 989.50
Transfer/sec: 7.57MB
FastAPI + SQLModel + gunicorn (4 workers)
$ gunicorn -w 4 -k uvicorn.workers.UvicornWorker fast_sqlmodel:app
$ wrk http://localhost:8000
Running 10s test @ http://localhost:8000
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 17.72ms 10.82ms 57.08ms 58.85%
Req/Sec 286.39 112.97 515.00 56.00%
5723 requests in 10.05s, 35.04MB read
Requests/sec: 569.46
Transfer/sec: 3.49MB
Conclusion
It looks like the minimalist Express.js + pg combo wins this benchmarking round, followed by Flask with 4 gunicorn workers and Nest.js + Prisma.
Flask with the “flask run” server had a large number of non-2xx or 3xx responses, as expected of a development server.
FastAPI + psycopg2 + uvicorn, on the other hand, seemed to lag behind Express.js + pg by over 6x.
FastAPI + SQLModel + gunicorn compared to Nest.js + Prisma is about a 2x difference.
Interestingly, Flask + psycopg2 + gunicorn beats out FastAPI + psycopg2 + gunicorn by almost 2x.
Is it safe to say that FastAPI is not on par with the Node.js frameworks in terms of performance? Or have I conducted the tests incorrectly? Perhaps I’m not leveraging FastAPI’s async functionality in the right way?
At the end of the day, it probably doesn’t matter too much which framework you choose. Just use whatever language you’re most productive in since developer time is usually more expensive than computing power.
Update 1/3/22
Thanks to Dmitry for pointing out that I should use the asyncpg library instead of psycopg2.
I also included a benchmark for the encode/databases library, which is advertised on the FastAPI website under the Async SQL section.
It appears that FastAPI is still behind Node.js in performance despite adding the async database drivers. The JSON serialization is a possible bottleneck. If anyone knows how to optimize this, please let me know!
Here are the updated results with asyncpg and databases.
FastAPI + asyncpg + uvicorn
$ uvicorn fast_asyncpg:app
$ wrk http://localhost:8000
Running 10s test @ http://localhost:8000
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 31.74ms 3.49ms 79.23ms 97.83%
Req/Sec 158.01 13.26 181.00 85.00%
3172 requests in 10.09s, 24.26MB read
Requests/sec: 314.52
Transfer/sec: 2.41MB
FastAPI + asyncpg + gunicorn (4 workers)
$ gunicorn -w 4 -k uvicorn.workers.UvicornWorker fast_asyncpg:app
$ wrk http://localhost:8000
Running 10s test @ http://localhost:8000
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 10.48ms 4.41ms 27.52ms 59.84%
Req/Sec 478.83 84.78 666.00 60.50%
9552 requests in 10.02s, 73.06MB read
Requests/sec: 952.99
Transfer/sec: 7.29MB
FastAPI + databases + uvicorn
$ uvicorn fast_databases:app
$ wrk http://localhost:8000
Running 10s test @ http://localhost:8000
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 37.31ms 4.39ms 95.69ms 94.49%
Req/Sec 134.30 12.86 151.00 77.50%
2697 requests in 10.08s, 20.63MB read
Requests/sec: 267.69
Transfer/sec: 2.05MB
Update 1/4/22
I have confirmed that the JSON serialization was the bottleneck. The default serializer is much slower than ujson and orjson.
Here are the new results!
FastAPI + psycopg2 + uvicorn + orjson
$ uvicorn fast_psycopg:app
$ wrk http://localhost:8000/orjson
Running 10s test @ http://localhost:8000/orjson
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 12.06ms 1.77ms 30.71ms 87.39%
Req/Sec 415.52 23.51 464.00 72.50%
8317 requests in 10.05s, 63.61MB read
Requests/sec: 827.30
Transfer/sec: 6.33MB
FastAPI + asyncpg + uvicorn + orjson
$ uvicorn fast_asyncpg:app
$ wrk http://localhost:8000/orjson
Running 10s test @ http://localhost:8000/orjson
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 5.34ms 1.15ms 19.18ms 94.82%
Req/Sec 0.95k 44.71 1.01k 78.50%
18870 requests in 10.01s, 144.33MB read
Requests/sec: 1885.25
Transfer/sec: 14.42MB
FastAPI + asyncpg + uvicorn + ujson
$ uvicorn fast_asyncpg:app
$ wrk http://localhost:8000/ujson
Running 10s test @ http://localhost:8000/ujson
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 5.85ms 0.97ms 18.79ms 88.35%
Req/Sec 0.86k 33.84 0.92k 82.00%
17134 requests in 10.01s, 131.05MB read
Requests/sec: 1711.69
Transfer/sec: 13.09MB
FastAPI + asyncpg + gunicorn (4 workers) + orjson
$ gunicorn -w 4 -k uvicorn.workers.UvicornWorker fast_asyncpg:app
$ wrk http://localhost:8000/orjson
Running 10s test @ http://localhost:8000/orjson
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 2.64ms 3.04ms 47.94ms 98.38%
Req/Sec 2.11k 733.24 3.31k 54.50%
41995 requests in 10.01s, 321.20MB read
Requests/sec: 4193.72
Transfer/sec: 32.08MB
FastAPI + asyncpg + gunicorn (4 workers) + ujson
$ gunicorn -w 4 -k uvicorn.workers.UvicornWorker fast_asyncpg:app
$ wrk http://localhost:8000/ujson
Running 10s test @ http://localhost:8000/ujson
2 threads and 10 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 2.27ms 794.94us 21.49ms 69.94%
Req/Sec 2.21k 316.77 2.79k 63.00%
44028 requests in 10.00s, 336.75MB read
Requests/sec: 4401.04
Transfer/sec: 33.66MB
Final Conclusion
The winner is FastAPI + asyncpg + 4 gunicorn workers + ujson.
FastAPI is definitely fast, on par with Node.js, and lives up to the hype! Well, according to these benchmarks.
Just make sure you’re using the right libraries with it!
I realized there is another flaw in the benchmark. Node.js has a cluster mode, which I was unaware of. For new benchmarks and a complete ranking, check out part 2 of this benchmarking article:
https://medium.com/@travisluong/fastapi-vs-fastify-vs-spring-boot-vs-gin-benchmark-b672a5c39d6c
If you’re interested in learning more about FastAPI and other amazing tools, check out my Full Stack Tutorial:
https://medium.com/@travisluong/full-stack-next-js-fastapi-postgresql-tutorial-86f0af0747b7
Thanks for reading.