RetryFuture Code Review & Improvement Plan

Date: October 7, 2025
Reviewer: AI Assistant
Status: Critical Issues Found

Executive Summary

The RetryFuture class has several critical bugs and architectural issues that need immediate attention before increased usage. The most severe issue is that the NetworkService creates a background timer that continues running after disposal, causing the provider disposal errors seen in tests.

Severity Ratings:

🔴 Critical: Must fix before production use
🟡 High: Should fix soon
🟢 Medium: Improve when convenient

Critical Issues 🔴

1. Provider Lifecycle Bug - Memory Leak & Disposal Errors

Location: _setupMonitoring() and NetworkService

Problem: When RetryFuture creates or uses a ProviderContainer, it subscribes to the networkServiceProvider. This provider creates a NetworkService which starts a Timer.periodic that runs every 60 seconds. When the RetryFuture is disposed:

If it owns the container, it disposes it
The NetworkService is disposed (Timer is cancelled in dispose)
BUT: The timer callback may have already been scheduled and executes AFTER disposal
This causes: Cannot use the Ref of networkServiceProvider after it has been disposed

Code Evidence:

// In NetworkService
Timer.periodic(_internetChecker.checkInterval, (_) {
  _checkNetworkStatus();  // ⚠️ Can run after disposal!
});

// _checkNetworkStatus tries to access state
state = NetworkServiceState(networkStatus: currentStatus);  // ⚠️ Uses disposed ref!

Impact:

Memory leaks in tests and production
Errors thrown after disposal
Unpredictable behavior
Cannot reliably unit test

Solution:

// In NetworkService - add proper disposal check
@override
void dispose() {
  _timer?.cancel();
  _timer = null;
  super.dispose();
}

Future<void> _checkNetworkStatus() async {
  // Add ref.mounted check
  if (!ref.mounted) {
    LOG.d('NetworkService: Skipping check, provider is disposed');
    return;
  }
  
  try {
    final currentStatus = await _internetChecker.status;
    
    // Check again before updating state
    if (!ref.mounted) return;
    
    final previousStatus = state.networkStatus;
    if (currentStatus != previousStatus) {
      state = NetworkServiceState(networkStatus: currentStatus);
    }
  } catch (e) {
    if (!ref.mounted) return;
    // handle error...
  }
}

2. Race Condition in `waitForNetwork()`

Location: waitForNetwork() method

Problem: Multiple concurrent calls to waitForNetwork() can create race conditions:

if (isNotComplete) {
  // Fall through to return the existing future with timeout
} else {
  _networkCompleter = Completer<bool>();  // ⚠️ Not thread-safe!
}

If two operations call waitForNetwork() simultaneously:

First checks isNotComplete (null) → false
Second checks isNotComplete (null) → false
Both create new Completers, second overwrites first
First caller waits on wrong completer → never completes

Impact:

Hangs/timeouts in concurrent scenarios
Unpredictable behavior
Hard to debug

Solution:

Future<bool> waitForNetwork() async {
  if (_online) {
    LOG.d('Network is already online, no need to wait.');
    return true;
  }

  // Create completer only if we don't have one
  _networkCompleter ??= Completer<bool>();
  
  try {
    bool result = await _networkCompleter!.future.timeout(options.networkTimeout);
    return result;
  } on TimeoutException {
    LOG.e('waitForNetwork timed out after ${options.networkTimeout.inSeconds} seconds.');
    if (!_networkCompleter!.isCompleted) {
      _networkCompleter!.complete(false);
    }
    return false;
  } catch (e) {
    LOG.e('waitForNetwork completed with an error: $e');
    rethrow;
  } finally {
    // Reset for next wait
    _networkCompleter = null;
  }
}

3. Incomplete Error Handling in `_checkState()`

Location: _checkState() method

Problem:

try {
  _container.read(networkServiceProvider).networkStatus;
} catch (e) {
  LOG.w('Failed to read network status: $e');
  _throw('Failed to read network status (the container might have been disposed): $e');
}

This catches ALL exceptions and throws RetryException, but:

Doesn’t check ref.mounted
Doesn’t distinguish between disposed container vs other errors
May hide legitimate errors

Solution:

void _checkState() {
  if (_disposed) {
    _throw('RetryFuture is disposed, cannot check container state.');
  }

  if (_ownsContainer) {
    // We own it and track disposal ourselves
    return;
  }

  // For external containers, verify it's still valid
  try {
    final provider = _container.readProviderElement(networkServiceProvider);
    if (provider == null) {
      _throw('Network service provider is not available (container may be disposed)');
    }
  } on StateError catch (e) {
    _throw('Container has been disposed: $e', cause: e);
  } catch (e) {
    _throw('Unexpected error checking container state: $e', cause: e);
  }
}

High Priority Issues 🟡

4. Attempt Counter Logic Error

Location: _networkListener() method

Problem:

if (_attempt + 1 > options.maxAttempts) {
  LOG.w('Attempt count ($_attempt) soon exceeds max attempts (${options.maxAttempts}), -1 to allow retry.');
  _attempt--;  // ⚠️ Manipulating attempt counter in event handler!
}

This is problematic because:

Side effects in listener are hard to reason about
Can create situations where _attempt goes negative
Tightly couples network events to retry logic
Makes testing difficult

Better Approach:

// In _networkListener - just track state
if (!wasOnline && _online) {
  LOG.i('Network back online. Resetting retry state.');
  _resetRetryState();
  
  if (isNotComplete) {
    LOG.d('Network back online, completing pending network ready completer.');
    _networkCompleter!.complete(true);
  }
}

// In tryRetry - check network recovery
if (!_online) {
  bool recovered = await waitForNetwork();
  if (recovered) {
    // Reset attempts when network recovers
    _attempt = 0;
  } else {
    _throw('Network timeout after ${options.networkTimeout}');
  }
}

5. Unclear Disposal Semantics

Location: dispose() method

Problem:

if (isNotComplete) {
  LOG.w('Retrier disposed while waiting for network. Completing with error.');
  // Return true, as this may not be an error but the result of user action
  _networkCompleter!.complete(true);  // ⚠️ Completes with success when disposing?
}

Completing with true when disposing is confusing:

true suggests network is available (it’s not)
Comment says “may not be an error” but could mask real issues
Caller can’t distinguish disposal from actual network recovery

Solution:

if (isNotComplete) {
  LOG.w('Retrier disposed while waiting for network.');
  // Complete with error so caller knows disposal happened
  _networkCompleter!.completeError(
    RetryException.fromCode(
      RetryExceptionCode.disposed,
      debugMsg: 'RetryFuture was disposed while waiting for network',
    ),
  );
}

Add new error code:

enum RetryExceptionCode {
  error,
  networkError,
  disposed,  // NEW
}

6. No Cancellation Support

Problem: Once tryRetry() is called, there’s no way to cancel it except disposing the entire RetryFuture. In real-world scenarios (like uploads), users may want to cancel specific operations.

Solution:

class RetryFuture<T> {
  bool _cancelled = false;
  
  /// Cancel the current retry operation
  void cancel() {
    _cancelled = true;
    if (isNotComplete) {
      _networkCompleter?.complete(false);
    }
  }
  
  Future<T> tryRetry() async {
    _cancelled = false;  // Reset on new attempt
    _attempt = 0;
    
    while (true) {
      if (_cancelled) {
        _throw('Operation was cancelled', code: RetryExceptionCode.cancelled);
      }
      
      _checkState();
      // ... rest of logic
    }
  }
}

Medium Priority Improvements 🟢

7. Improve Testability

Current Issues:

Hard to test without actual network
Timer-based logic is time-dependent
No dependency injection for NetworkService

Improvements:

// Add factory for testing
class RetryFuture<T> {
  final NetworkMonitor? _networkMonitor;
  
  RetryFuture({
    required this.operation,
    ProviderContainer? container,
    NetworkMonitor? networkMonitor,  // For testing
    // ...
  }) : _networkMonitor = networkMonitor {
    // Use injected monitor if provided (tests)
    // Otherwise use provider-based monitor (production)
  }
}

abstract class NetworkMonitor {
  bool get isOnline;
  Stream<bool> get onlineStream;
}

8. Add Comprehensive Logging Context

Current: Logs are scattered and lack context

Improvement:

class RetryFuture<T> {
  final String? _operationId;  // For tracing
  
  void _log(String level, String message, {Object? error, StackTrace? st}) {
    final prefix = _operationId != null ? '[$_operationId] ' : '';
    switch (level) {
      case 'debug': LOG.d('$prefix$message');
      case 'info': LOG.i('$prefix$message');
      case 'warn': LOG.w('$prefix$message');
      case 'error': LOG.e('$prefix$message', error: error, stackTrace: st);
    }
  }
}

9. Simplify Public API

Current Confusion:

// What's the difference?
final rf = RetryFuture(operation: () => fetch());
await rf.tryRetry();  // Why "try"? It either retries or throws

// vs
rf.dispose();  // When do I call this?

Clearer API:

// Option 1: Make it more like a regular Future
class RetryFuture<T> extends Future<T> {
  // Can be awaited directly
  // Auto-disposes on completion
}

// Option 2: Use explicit resource management
class RetryOperation<T> implements Disposable {
  Future<T> execute();  // Clearer intent
  
  // Use with async pattern
  Future<T> withRetry(Future<T> Function() op) async {
    final retry = RetryOperation(operation: op);
    try {
      return await retry.execute();
    } finally {
      await retry.dispose();
    }
  }
}

10. Add Metrics/Observability

class RetryMetrics {
  int totalAttempts = 0;
  int successfulRetries = 0;
  int failedRetries = 0;
  Duration totalRetryTime = Duration.zero;
  List<Duration> attemptDurations = [];
}

class RetryFuture<T> {
  final RetryMetrics metrics = RetryMetrics();
  
  Future<T> tryRetry() async {
    final startTime = DateTime.now();
    try {
      // ... existing logic
      metrics.successfulRetries++;
      return result;
    } catch (e) {
      metrics.failedRetries++;
      rethrow;
    } finally {
      metrics.totalRetryTime = DateTime.now().difference(startTime);
    }
  }
}

Recommended Action Plan

Phase 1: Critical Fixes (This Week) 🔴

✅ Fix NetworkService disposal (add ref.mounted checks)
✅ Fix waitForNetwork() race condition
✅ Improve _checkState() error handling
✅ Add unit tests for disposal scenarios

Phase 2: High Priority (Next Sprint) 🟡

✅ Refactor attempt counter logic
✅ Fix disposal semantics (complete with error)
✅ Add cancellation support
✅ Add integration tests

Phase 3: Improvements (Following Sprint) 🟢

✅ Improve testability with dependency injection
✅ Add comprehensive logging
✅ Simplify public API
✅ Add metrics/observability

Testing Strategy

Unit Tests Needed:

test('disposes cleanly without errors', () async {
  final rf = RetryFuture(operation: () async => 'test');
  await rf.dispose();
  // Should not throw
});

test('handles concurrent waitForNetwork calls', () async {
  // Test race condition fix
});

test('can be cancelled mid-retry', () async {
  // Test cancellation
});

test('properly handles disposed container', () async {
  // Test external container disposal
});

Integration Tests Needed:

test('retries operation with real network service', () async {
  // Use actual providers
});

test('handles network going offline and back online', () async {
  // Simulate network changes
});

RetryFuture Code Review & Improvement Plan

Executive Summary

Critical Issues 🔴

1. Provider Lifecycle Bug - Memory Leak & Disposal Errors

2. Race Condition in waitForNetwork()

3. Incomplete Error Handling in _checkState()

High Priority Issues 🟡

4. Attempt Counter Logic Error

5. Unclear Disposal Semantics

6. No Cancellation Support

Medium Priority Improvements 🟢

7. Improve Testability

8. Add Comprehensive Logging Context

9. Simplify Public API

10. Add Metrics/Observability

Recommended Action Plan

Phase 1: Critical Fixes (This Week) 🔴

Phase 2: High Priority (Next Sprint) 🟡

Phase 3: Improvements (Following Sprint) 🟢

Testing Strategy

Unit Tests Needed:

Integration Tests Needed:

2. Race Condition in `waitForNetwork()`

3. Incomplete Error Handling in `_checkState()`