Retry future review
Date: October 7, 2025
Reviewer: AI Assistant
Status: Critical Issues Found
The RetryFuture class has several critical bugs and architectural issues that need immediate attention before increased usage. The most severe issue is that the NetworkService creates a background timer that continues running after disposal, causing the provider disposal errors seen in tests.
Severity Ratings:
- 🔴 Critical: Must fix before production use
- 🟡 High: Should fix soon
- 🟢 Medium: Improve when convenient
Location: _setupMonitoring() and NetworkService
Problem:
When RetryFuture creates or uses a ProviderContainer, it subscribes to the networkServiceProvider. This provider creates a NetworkService which starts a Timer.periodic that runs every 60 seconds. When the RetryFuture is disposed:
- If it owns the container, it disposes it
- The
NetworkServiceis disposed (Timer is cancelled in dispose) - BUT: The timer callback may have already been scheduled and executes AFTER disposal
- This causes:
Cannot use the Ref of networkServiceProvider after it has been disposed
Code Evidence:
// In NetworkService
Timer.periodic(_internetChecker.checkInterval, (_) {
_checkNetworkStatus(); // ⚠️ Can run after disposal!
});
// _checkNetworkStatus tries to access state
state = NetworkServiceState(networkStatus: currentStatus); // ⚠️ Uses disposed ref!
Impact:
- Memory leaks in tests and production
- Errors thrown after disposal
- Unpredictable behavior
- Cannot reliably unit test
Solution:
// In NetworkService - add proper disposal check
@override
void dispose() {
_timer?.cancel();
_timer = null;
super.dispose();
}
Future<void> _checkNetworkStatus() async {
// Add ref.mounted check
if (!ref.mounted) {
LOG.d('NetworkService: Skipping check, provider is disposed');
return;
}
try {
final currentStatus = await _internetChecker.status;
// Check again before updating state
if (!ref.mounted) return;
final previousStatus = state.networkStatus;
if (currentStatus != previousStatus) {
state = NetworkServiceState(networkStatus: currentStatus);
}
} catch (e) {
if (!ref.mounted) return;
// handle error...
}
}
Location: waitForNetwork() method
Problem:
Multiple concurrent calls to waitForNetwork() can create race conditions:
if (isNotComplete) {
// Fall through to return the existing future with timeout
} else {
_networkCompleter = Completer<bool>(); // ⚠️ Not thread-safe!
}
If two operations call waitForNetwork() simultaneously:
- First checks
isNotComplete(null) → false - Second checks
isNotComplete(null) → false - Both create new Completers, second overwrites first
- First caller waits on wrong completer → never completes
Impact:
- Hangs/timeouts in concurrent scenarios
- Unpredictable behavior
- Hard to debug
Solution:
Future<bool> waitForNetwork() async {
if (_online) {
LOG.d('Network is already online, no need to wait.');
return true;
}
// Create completer only if we don't have one
_networkCompleter ??= Completer<bool>();
try {
bool result = await _networkCompleter!.future.timeout(options.networkTimeout);
return result;
} on TimeoutException {
LOG.e('waitForNetwork timed out after ${options.networkTimeout.inSeconds} seconds.');
if (!_networkCompleter!.isCompleted) {
_networkCompleter!.complete(false);
}
return false;
} catch (e) {
LOG.e('waitForNetwork completed with an error: $e');
rethrow;
} finally {
// Reset for next wait
_networkCompleter = null;
}
}
Location: _checkState() method
Problem:
try {
_container.read(networkServiceProvider).networkStatus;
} catch (e) {
LOG.w('Failed to read network status: $e');
_throw('Failed to read network status (the container might have been disposed): $e');
}
This catches ALL exceptions and throws RetryException, but:
- Doesn’t check
ref.mounted - Doesn’t distinguish between disposed container vs other errors
- May hide legitimate errors
Solution:
void _checkState() {
if (_disposed) {
_throw('RetryFuture is disposed, cannot check container state.');
}
if (_ownsContainer) {
// We own it and track disposal ourselves
return;
}
// For external containers, verify it's still valid
try {
final provider = _container.readProviderElement(networkServiceProvider);
if (provider == null) {
_throw('Network service provider is not available (container may be disposed)');
}
} on StateError catch (e) {
_throw('Container has been disposed: $e', cause: e);
} catch (e) {
_throw('Unexpected error checking container state: $e', cause: e);
}
}
Location: _networkListener() method
Problem:
if (_attempt + 1 > options.maxAttempts) {
LOG.w('Attempt count ($_attempt) soon exceeds max attempts (${options.maxAttempts}), -1 to allow retry.');
_attempt--; // ⚠️ Manipulating attempt counter in event handler!
}
This is problematic because:
- Side effects in listener are hard to reason about
- Can create situations where
_attemptgoes negative - Tightly couples network events to retry logic
- Makes testing difficult
Better Approach:
// In _networkListener - just track state
if (!wasOnline && _online) {
LOG.i('Network back online. Resetting retry state.');
_resetRetryState();
if (isNotComplete) {
LOG.d('Network back online, completing pending network ready completer.');
_networkCompleter!.complete(true);
}
}
// In tryRetry - check network recovery
if (!_online) {
bool recovered = await waitForNetwork();
if (recovered) {
// Reset attempts when network recovers
_attempt = 0;
} else {
_throw('Network timeout after ${options.networkTimeout}');
}
}
Location: dispose() method
Problem:
if (isNotComplete) {
LOG.w('Retrier disposed while waiting for network. Completing with error.');
// Return true, as this may not be an error but the result of user action
_networkCompleter!.complete(true); // ⚠️ Completes with success when disposing?
}
Completing with true when disposing is confusing:
truesuggests network is available (it’s not)- Comment says “may not be an error” but could mask real issues
- Caller can’t distinguish disposal from actual network recovery
Solution:
if (isNotComplete) {
LOG.w('Retrier disposed while waiting for network.');
// Complete with error so caller knows disposal happened
_networkCompleter!.completeError(
RetryException.fromCode(
RetryExceptionCode.disposed,
debugMsg: 'RetryFuture was disposed while waiting for network',
),
);
}
Add new error code:
enum RetryExceptionCode {
error,
networkError,
disposed, // NEW
}
Problem:
Once tryRetry() is called, there’s no way to cancel it except disposing the entire RetryFuture. In real-world scenarios (like uploads), users may want to cancel specific operations.
Solution:
class RetryFuture<T> {
bool _cancelled = false;
/// Cancel the current retry operation
void cancel() {
_cancelled = true;
if (isNotComplete) {
_networkCompleter?.complete(false);
}
}
Future<T> tryRetry() async {
_cancelled = false; // Reset on new attempt
_attempt = 0;
while (true) {
if (_cancelled) {
_throw('Operation was cancelled', code: RetryExceptionCode.cancelled);
}
_checkState();
// ... rest of logic
}
}
}
Current Issues:
- Hard to test without actual network
- Timer-based logic is time-dependent
- No dependency injection for NetworkService
Improvements:
// Add factory for testing
class RetryFuture<T> {
final NetworkMonitor? _networkMonitor;
RetryFuture({
required this.operation,
ProviderContainer? container,
NetworkMonitor? networkMonitor, // For testing
// ...
}) : _networkMonitor = networkMonitor {
// Use injected monitor if provided (tests)
// Otherwise use provider-based monitor (production)
}
}
abstract class NetworkMonitor {
bool get isOnline;
Stream<bool> get onlineStream;
}
Current: Logs are scattered and lack context
Improvement:
class RetryFuture<T> {
final String? _operationId; // For tracing
void _log(String level, String message, {Object? error, StackTrace? st}) {
final prefix = _operationId != null ? '[$_operationId] ' : '';
switch (level) {
case 'debug': LOG.d('$prefix$message');
case 'info': LOG.i('$prefix$message');
case 'warn': LOG.w('$prefix$message');
case 'error': LOG.e('$prefix$message', error: error, stackTrace: st);
}
}
}
Current Confusion:
// What's the difference?
final rf = RetryFuture(operation: () => fetch());
await rf.tryRetry(); // Why "try"? It either retries or throws
// vs
rf.dispose(); // When do I call this?
Clearer API:
// Option 1: Make it more like a regular Future
class RetryFuture<T> extends Future<T> {
// Can be awaited directly
// Auto-disposes on completion
}
// Option 2: Use explicit resource management
class RetryOperation<T> implements Disposable {
Future<T> execute(); // Clearer intent
// Use with async pattern
Future<T> withRetry(Future<T> Function() op) async {
final retry = RetryOperation(operation: op);
try {
return await retry.execute();
} finally {
await retry.dispose();
}
}
}
class RetryMetrics {
int totalAttempts = 0;
int successfulRetries = 0;
int failedRetries = 0;
Duration totalRetryTime = Duration.zero;
List<Duration> attemptDurations = [];
}
class RetryFuture<T> {
final RetryMetrics metrics = RetryMetrics();
Future<T> tryRetry() async {
final startTime = DateTime.now();
try {
// ... existing logic
metrics.successfulRetries++;
return result;
} catch (e) {
metrics.failedRetries++;
rethrow;
} finally {
metrics.totalRetryTime = DateTime.now().difference(startTime);
}
}
}
- ✅ Fix NetworkService disposal (add
ref.mountedchecks) - ✅ Fix
waitForNetwork()race condition - ✅ Improve
_checkState()error handling - ✅ Add unit tests for disposal scenarios
- ✅ Refactor attempt counter logic
- ✅ Fix disposal semantics (complete with error)
- ✅ Add cancellation support
- ✅ Add integration tests
- ✅ Improve testability with dependency injection
- ✅ Add comprehensive logging
- ✅ Simplify public API
- ✅ Add metrics/observability
test('disposes cleanly without errors', () async {
final rf = RetryFuture(operation: () async => 'test');
await rf.dispose();
// Should not throw
});
test('handles concurrent waitForNetwork calls', () async {
// Test race condition fix
});
test('can be cancelled mid-retry', () async {
// Test cancellation
});
test('properly handles disposed container', () async {
// Test external container disposal
});
test('retries operation with real network service', () async {
// Use actual providers
});
test('handles network going offline and back online', () async {
// Simulate network changes
});