Dec 5, 2025

ListObjectsV2 is Too Slow!

This article is Day 5 of coins Advent Calendar 2025.

Building GitLFS

I was furious at GitHub’s expensive LFS pricing, so I’m developing candylfs, a GitLFS-compatible SaaS.

Needed an API/function to get object count and capacity per path. Initially did full fetch each time, but confirmed obvious slowdown as object count increased.

With 0 objects, ~17ms blazing fast.

With ~4000 objects, ~6 seconds (6000ms). Noticeably slow.

CloudWatch Log Observation

Suspected ListObjectsV2 API was bottleneck, confirmed with logs.

[getSubscription] START tenantId=tenant-a
[getSubscription] Access check: 11ms
[getSubscription] DynamoDB get: 20ms
[getSubscription] S3 client init: 119ms
[getSubscription] Page 1: 1000 objects, 3121ms
[getSubscription] Page 2: 1000 objects, 2503ms
[getSubscription] Page 3: 1000 objects, 7060ms
[getSubscription] Page 4: 936 objects, 1882ms
[getSubscription] ListObjects total: 4 pages, 3936 objects, 14602ms
[getSubscription] END total: 14753ms

REPORT Duration: 14931.48 ms  Memory: 148 MB

Some data redacted, but this is roughly what was captured. Yes, slow. ListObjectsV2 can’t get all objects at once, pagination limited to 1000 per request. Each takes ~2-3 seconds, so 4000 objects = 4 loops = ~12 seconds.

Parallel Execution

Searched for others with this problem, found examples using prefix-based parallel execution for speedup. https://jboothomas.medium.com/fast-listing-s3-objects-from-buckets-with-millions-billions-of-items-380052fb6faf

LFS takes file hash on client upload and uses it as pointer.

All object names are saved as hash values, guaranteeing even distribution across prefixes 0123456789abcdef. So max 16 parallel ListObjectsV2 API calls possible.

Caching

Theoretically up to 16x speed, but 16x execution cost too. So implemented DynamoDB cache. Here’s the result:

[getSubscription] START tenantId=tenant-a
[getSubscription] Access check: 12ms
[getSubscription] DynamoDB get: 74ms
[getSubscription] S3 client init: 0ms (cached)
[getSubscription] Parallel scan: 16 requests, 2925ms
[getSubscription] END total: 3012ms, 3936 objects

REPORT Duration: 3032.60 ms  Memory: 155 MB

[getSubscription] Parallel scan: 16 requests, 2925ms

~4x speedup achieved. Final approach: use DynamoDB cache when available, otherwise fetch with speedup method then cache.

Lambda Cost vs API Cost?

Object count/capacity needed for management console API display and quota check when client calls Batch API. Console can tolerate caching/non-realtime, but quota check needs some accuracy.

So only Batch API gets true value, but 16x cost is unacceptable. Using atomic counter in DynamoDB cache for instant client response, then fetching true value with ListObjectsV2 after.

Question: which costs more, 16 parallel API calls or Lambda runtime? Lambda at 256MB costs:

0.25 GB × 0.0000166667 USD/GB-second
= 0.000004166675 USD / second

ListObjectsV2 is Class A operation at $4.5/1M calls. So 16 parallel execution:

4.5 USD / 1,000,000 × 16
= (4.5 × 16) / 1,000,000
= 72 / 1,000,000
= 0.000072 USD / call

Therefore:

N = object count, MaxKeys (pagination) = 1000, assuming 2.5 seconds per call

Item	Sequential	16 Parallel
API calls	⌈N/1000⌉	16 × ⌈N/16000⌉
Duration	⌈N/1000⌉ × 2.5s	⌈N/16000⌉ × 2.5s

Consider N = 17000 where parallel is disadvantageous:

Item	Sequential	16 Parallel
API calls	17	16 × 2 = 32
Duration	42.5s	5s
Lambda time cost	0.000178 USD	0.000021 USD
API cost	0.0000765 USD	0.000144 USD
Total	0.000255 USD	0.000165 USD

Finding the Break-even Point

Parallel becomes disadvantageous when: shortened Lambda time cost < additional API cost.

Shortened time = Sequential time - Parallel time
Additional API = Parallel API calls - Sequential API calls × 0.0000045

Parallel loses = Shortened time × 0.0000042 < Additional API × 0.0000045

N	Sequential API	16 Parallel API	Additional API	Shortened Time	Parallel Wins?
1000	1	16	+15	0s	Loses
5000	5	16	+11	10s	Wins
10000	10	16	+6	22.5s	Wins
16000	16	16	±0	37.5s	Wins
17000	17	32	+15	37.5s	Wins

Break-even point (seconds) = 1.07 × (16P - S) / (S - P)

S = ⌈N/1000⌉   (Sequential API calls)
P = ⌈N/16000⌉  (Parallel rounds)

Plugging into this formula, around 5000 objects gives 2.95 seconds, so parallel is cheaper beyond this point.