JP / EN

ListObjectsV2 is Too Slow!


This article is Day 5 of coins Advent Calendar 2025.

Building GitLFS

I was furious at GitHub’s expensive LFS pricing, so I’m developing candylfs, a GitLFS-compatible SaaS.

https://candylfs.stream

Needed an API/function to get object count and capacity per path. Initially did full fetch each time, but confirmed obvious slowdown as object count increased.

With 0 objects, ~17ms blazing fast.

With ~4000 objects, ~6 seconds (6000ms). Noticeably slow.

CloudWatch Log Observation

Suspected ListObjectsV2 API was bottleneck, confirmed with logs.

[getSubscription] START tenantId=tenant-a
[getSubscription] Access check: 11ms
[getSubscription] DynamoDB get: 20ms
[getSubscription] S3 client init: 119ms
[getSubscription] Page 1: 1000 objects, 3121ms
[getSubscription] Page 2: 1000 objects, 2503ms
[getSubscription] Page 3: 1000 objects, 7060ms
[getSubscription] Page 4: 936 objects, 1882ms
[getSubscription] ListObjects total: 4 pages, 3936 objects, 14602ms
[getSubscription] END total: 14753ms

REPORT Duration: 14931.48 ms  Memory: 148 MB

Some data redacted, but this is roughly what was captured. Yes, slow. ListObjectsV2 can’t get all objects at once, pagination limited to 1000 per request. Each takes ~2-3 seconds, so 4000 objects = 4 loops = ~12 seconds.

Parallel Execution

Searched for others with this problem, found examples using prefix-based parallel execution for speedup. https://jboothomas.medium.com/fast-listing-s3-objects-from-buckets-with-millions-billions-of-items-380052fb6faf

LFS takes file hash on client upload and uses it as pointer.

All object names are saved as hash values, guaranteeing even distribution across prefixes 0123456789abcdef. So max 16 parallel ListObjectsV2 API calls possible.

Caching

Theoretically up to 16x speed, but 16x execution cost too. So implemented DynamoDB cache. Here’s the result:

[getSubscription] START tenantId=tenant-a
[getSubscription] Access check: 12ms
[getSubscription] DynamoDB get: 74ms
[getSubscription] S3 client init: 0ms (cached)
[getSubscription] Parallel scan: 16 requests, 2925ms
[getSubscription] END total: 3012ms, 3936 objects

REPORT Duration: 3032.60 ms  Memory: 155 MB

[getSubscription] Parallel scan: 16 requests, 2925ms

~4x speedup achieved. Final approach: use DynamoDB cache when available, otherwise fetch with speedup method then cache.

Lambda Cost vs API Cost?

Object count/capacity needed for management console API display and quota check when client calls Batch API. Console can tolerate caching/non-realtime, but quota check needs some accuracy.

So only Batch API gets true value, but 16x cost is unacceptable. Using atomic counter in DynamoDB cache for instant client response, then fetching true value with ListObjectsV2 after.

Question: which costs more, 16 parallel API calls or Lambda runtime? Lambda at 256MB costs:

0.25 GB × 0.0000166667 USD/GB-second
= 0.000004166675 USD / second

ListObjectsV2 is Class A operation at $4.5/1M calls. So 16 parallel execution:

4.5 USD / 1,000,000 × 16
= (4.5 × 16) / 1,000,000
= 72 / 1,000,000
= 0.000072 USD / call

Therefore:

N = object count, MaxKeys (pagination) = 1000, assuming 2.5 seconds per call

ItemSequential16 Parallel
API calls⌈N/1000⌉16 × ⌈N/16000⌉
Duration⌈N/1000⌉ × 2.5s⌈N/16000⌉ × 2.5s

Consider N = 17000 where parallel is disadvantageous:

ItemSequential16 Parallel
API calls1716 × 2 = 32
Duration42.5s5s
Lambda time cost0.000178 USD0.000021 USD
API cost0.0000765 USD0.000144 USD
Total0.000255 USD0.000165 USD

Finding the Break-even Point

Parallel becomes disadvantageous when: shortened Lambda time cost < additional API cost.

Shortened time = Sequential time - Parallel time
Additional API = Parallel API calls - Sequential API calls × 0.0000045

Parallel loses = Shortened time × 0.0000042 < Additional API × 0.0000045
NSequential API16 Parallel APIAdditional APIShortened TimeParallel Wins?
1000116+150sLoses
5000516+1110sWins
100001016+622.5sWins
160001616±037.5sWins
170001732+1537.5sWins
Break-even point (seconds) = 1.07 × (16P - S) / (S - P)

S = ⌈N/1000⌉   (Sequential API calls)
P = ⌈N/16000⌉  (Parallel rounds)

Plugging into this formula, around 5000 objects gives 2.95 seconds, so parallel is cheaper beyond this point.

Back to list