# Propreports Normalizer - Implementation Guide

## Executive Summary

**Status:** CRÍTICO ⚠️⚠️⚠️ - Hash match rate actual: ~0%

**Critical Issue:** Portfolio/reference_code fields added to hash cause 100% trade duplication in re-imports

**Implementation Phases:**
- **Fase 1 (CRÍTICO):** Hash fix - 1-2 días
- **Fase 2:** Data validations - 1.5 días
- **Fase 3 (CONDICIONAL):** CSV support - 0.5-4 días

**Total Critical Path:** 2.5-3.5 días (Fases 1-2 only)

---

## Tabla de Contenidos

1. [Contexto](#contexto)
2. [Arquitectura](#arquitectura)
3. [Issue Crítico: Hash Computation](#issue-crítico-hash-computation)
4. [Fase 1: Critical Hash Fix (1-2 días)](#fase-1-critical-hash-fix-1-2-días)
5. [Fase 2: Data Validation (1.5 días)](#fase-2-data-validation-15-días)
6. [Fase 3: CSV Support (0.5-4 días - CONDICIONAL)](#fase-3-csv-support-05-4-días---condicional)
7. [Testing Strategy](#testing-strategy)
8. [Deployment Checklist](#deployment-checklist)

---

## Contexto

### Broker Information

- **Broker:** Propreports (ID: 240)
- **Assets:** Stocks, Options (OCC format)
- **Format:** JSON API (sync)
- **CSV Formats (Legacy):** 3 variants (676 líneas)
- **Broker Variants:** PropReport, Zimtra, Ocean One Securities, CenterPoint Securities, CMEG

### Code Metrics

| Métrica | Legacy | Nueva | Reducción |
|---------|--------|-------|-----------|
| **API normalization** | 604 líneas | 423 líneas | **30%** ✅ |
| **CSV parsers** | 676 líneas | 0 líneas | **N/A** ⚠️ |
| **Tests** | 0 líneas | 554 líneas | **N/A** ✅ |
| **Total** | 1,280 líneas | 1,015 líneas | **21%** ✅ |

### Hash Match Rate Comparison

| Broker | Hash Match Rate | Critical Issues |
|--------|-----------------|-----------------|
| **Propreports** | **~0%** ⚠️⚠️⚠️ | Portfolio/ref fields in hash |
| Oanda | ~0% ⚠️⚠️⚠️ | Buy/sell field missing |
| Deribit | 74.27% ⚠️ | Expired options suffix |
| Charles Schwab | 42% ⚠️ | ClosingPrice volatility |
| OKX | 95-100% ✅ | None |
| KuCoin | 100% ✅ | None |

**Conclusión:** Propreports tiene el MISMO hash match rate crítico que Oanda.

---

## Arquitectura

### Pipeline Flow

```
Raw Data (JSON API)
        ↓
┌─────────────────────────────┐
│ PropreportsInterpreter      │
│ - parse_json_content()      │ ← Parse JSON, compute hash
│ - normalize()               │ ← Transform to 20-column schema
└─────────────────────────────┘
        ↓
Normalized Executions
(20 columns schema)
        ↓
p02_deduplicate (file_row hash)
        ↓
p03_group (grouping logic)
        ↓
p04_calculate (P&L, metrics)
        ↓
p05_write (database)
```

### Key Architectural Patterns

1. **JSON API Parsing**
   - Input: Array of orders from PropReports API
   - Supports wrapping in objects with `orders` or `data` keys
   - Pre-computes: file_row_hash, original_order, fee_sum, row_index

2. **Hash Computation Pattern**
   ```python
   Step 1: Strip milliseconds from date/time
   Step 2: Hash order WITHOUT portfolio/reference_code ← CRITICAL FIX NEEDED
   ```

3. **Fee Structure**
   - **10 fee columns:** ECN Fee, SEC, ORF, CAT, TAF, NFA, NSCC, NASD, Acc, Clr, Misc
   - **Legacy:** Includes NASD (missing in new implementation)
   - **Summation:** All fees summed into `fees` column

4. **Option Symbol Format (OCC)**
   ```
   +SPY250722C00626000
   ├─ + : Option prefix (stripped)
   ├─ SPY : Underlying (variable length)
   ├─ 250722 : Expiration (YYMMDD)
   ├─ C : Call (or P for Put)
   └─ 00626000 : Strike * 1000 (8 digits) → 626.00
   ```

5. **Side Mappings**
   - B → BUY
   - BUY → BUY
   - COVER → BUY (missing in new implementation)
   - S → SELL
   - SELL → SELL
   - SHORT → SELL
   - T → SELL (Short sell)

### File Structure

```
brokers/propreports/
├── __init__.py           (6 líneas)   - Module exports
├── detector.py           (32 líneas)  - Format detection
└── propreports.py        (423 líneas) - Main interpreter
    ├── BROKER_ID = "propreports"
    ├── SIDE_MAP (6 mappings)
    ├── FEE_COLUMNS (10 columns)
    ├── can_handle()                (class method)
    ├── parse_json_content()        (class method) ← Hash computation
    ├── normalize()                 (instance method)
    └── _parse_option_symbol()      (helper)

tests/brokers/
└── test_propreports.py   (554 líneas) - 54 test cases
```

---

## Issue Crítico: Hash Computation

### Root Cause Analysis

**Problema:** ~0% hash match rate → 100% trade duplication

**Causa:** Nueva implementación añade `portfolio` y `reference_code` al order object ANTES de hashear. Legacy NO añade estos campos.

### Evidence Comparison

#### Legacy (propreports_export.py:476-482)

```python
try:
    # Strip milliseconds
    order['date/time'] = order['date/time'].split('.')[0]
except:
    pass

# Hash WITHOUT portfolio/reference_code
njson = json.dumps(order)
njson = hashlib.md5(njson.encode('utf-8')).hexdigest()
```

#### Current (propreports.py:163-178)

```python
# Build order for hash
order_for_hash = dict(order)

# Strip milliseconds
order_for_hash[datetime_key] = dt_value.split('.')[0]

# ADD portfolio and reference_code BEFORE hashing ← PROBLEMA
order_for_hash["portfolio"] = portfolio
order_for_hash["reference_code"] = reference_code

# Hash WITH these fields included
file_row_hash = hashlib.md5(json.dumps(order_for_hash).encode('utf-8')).hexdigest()
```

### Impact Analysis

| Aspecto | Impacto |
|---------|---------|
| **Hash match rate** | ~0% (worst with Oanda) |
| **Re-import behavior** | 100% trades duplicated |
| **Data integrity** | ✅ 100% correct (only hash differs) |
| **User experience** | ⚠️⚠️⚠️ CRÍTICO - duplicate trades everywhere |
| **Database bloat** | ⚠️⚠️⚠️ CRÍTICO - 2x storage usage |

### Fixture Notes Contradiction

**Fixture (tests/integration/fixtures/propreports.json:17):**
```json
"notes": "Validated 100% match rate on 20 records for user 4359..."
```

But the hash formula described does NOT mention portfolio/reference_code, yet the new code adds them. This is **contradictory** and suggests:

1. Fixture notes describe LEGACY formula (correct)
2. New implementation DEVIATES from legacy (incorrect)
3. "100% match" claim needs verification with real data

---

## Fase 1: Critical Hash Fix (1-2 días)

### Objetivos

- [ ] Remove portfolio/reference_code from hash computation
- [ ] Achieve hash match rate >= 95%
- [ ] Pass all 8 hash compatibility tests
- [ ] Verify with real legacy data

### Complejidad

- **Nivel:** ALTA
- **Riesgo:** ALTO (critical para deduplication)
- **Effort:** 1-2 días
- **Tests:** 8 nuevos tests

### Implementación

#### Paso 1.1: Modify Hash Computation (0.5 días)

**File:** `propreports.py:173-178`

**Current Code (PROBLEMA):**
```python
# Lines 173-178
order_for_hash = dict(order)
order_for_hash[datetime_key] = dt_value.split('.')[0]
order_for_hash["portfolio"] = portfolio  # ← REMOVE
order_for_hash["reference_code"] = reference_code  # ← REMOVE
file_row_hash = hashlib.md5(json.dumps(order_for_hash).encode('utf-8')).hexdigest()
```

**Modified Code (SOLUCIÓN):**
```python
# Lines 173-178
# Build order for hash (preserving original key order)
order_for_hash = dict(order)

# Strip milliseconds from date/time (legacy compatibility)
datetime_key = "date/time" if "date/time" in order_for_hash else "Date/Time"
if datetime_key in order_for_hash:
    dt_value = str(order_for_hash[datetime_key])
    # Split at '.' to remove milliseconds: "6/11/2025 15:03:12.000" → "6/11/2025 15:03:12"
    order_for_hash[datetime_key] = dt_value.split('.')[0]

# DO NOT add portfolio/reference_code to hash (legacy compatibility)
# Legacy formula (propreports_export.py:482): MD5(json.dumps(order))
# These fields are metadata added AFTER hashing for organization

# Hash the JSON (legacy compatible - no sort_keys to preserve order)
file_row_hash = hashlib.md5(json.dumps(order_for_hash).encode('utf-8')).hexdigest()
```

**Checklist:**
- [ ] Remove lines 174-175
- [ ] Add documentation comment explaining why
- [ ] Verify millisecond stripping still works
- [ ] Run unit tests

#### Paso 1.2: Add Hash Compatibility Tests (0.5 días)

**File:** `test_propreports.py`

**Test Class:**
```python
class TestPropreportsHashCompatibility:
    """Test hash computation for legacy compatibility"""

    def test_hash_without_portfolio_reference_code(self):
        """Hash must NOT include portfolio/reference_code"""

    def test_hash_matches_legacy_format(self):
        """Hash must match legacy formula exactly"""

    def test_hash_deterministic(self):
        """Same input must produce same hash"""

    def test_hash_milliseconds_stripped(self):
        """Milliseconds must be stripped before hashing"""

    def test_hash_json_key_order(self):
        """Document key order behavior (preserved from API)"""

    def test_hash_with_sample_data(self):
        """Integration test with sample data"""

    def test_hash_user_4359_compatibility(self):
        """Verify compatibility with user 4359 (newer format)"""

    def test_hash_user_40888_compatibility(self):
        """Verify compatibility with user 40888 (older format)"""
```

**Checklist:**
- [ ] 8 tests implemented
- [ ] All tests passing
- [ ] Edge cases covered
- [ ] Integration test with real data

#### Paso 1.3: Manual Verification (0.5 días)

**SQL Queries:**

```sql
-- Query 1: Check portfolio in legacy data
SELECT
    fk_user_id,
    COUNT(*) as records,
    SUM(CASE WHEN original_file_row LIKE '%portfolio%' THEN 1 ELSE 0 END) as has_portfolio,
    SUM(CASE WHEN original_file_row LIKE '%reference_code%' THEN 1 ELSE 0 END) as has_ref,
    LEFT(original_file_row, 300) as sample_json
FROM trade_history
WHERE broker_id = 240
  AND fk_user_id IN (4359, 40888)
GROUP BY fk_user_id
LIMIT 5;
```

**Manual Testing Script:**
```python
import json
import hashlib

# Sample order
order = {
    "propreports id": "82041",
    "date/time": "6/11/2025 15:03:12.000",
    "symbol": "SPY",
    "price": "450.50",
}

# Legacy approach
order_legacy = dict(order)
order_legacy["date/time"] = order_legacy["date/time"].split('.')[0]
hash_legacy = hashlib.md5(json.dumps(order_legacy).encode()).hexdigest()

# After fix
df = PropreportsInterpreter.parse_json_content(
    json.dumps([order]),
    portfolio="test",
    reference_code="123"
)
hash_fixed = df["_file_row_hash"][0]

print(f"Legacy: {hash_legacy}")
print(f"Fixed:  {hash_fixed}")
print(f"Match:  {hash_legacy == hash_fixed}")  # Should be True
```

**Checklist:**
- [ ] Execute SQL queries
- [ ] Obtain sample orders from user 4359
- [ ] Obtain sample orders from user 40888
- [ ] Compare hashes manually
- [ ] Validate match rate >= 95%

#### Paso 1.4: Update Fixture Notes (0.5 días)

**File:** `tests/integration/fixtures/propreports.json`

**Update line 17:**
```json
"notes": "Hash formula: strip milliseconds from date/time, MD5(json.dumps(order WITHOUT portfolio/reference_code)). Validated with user 4359 data."
```

**Checklist:**
- [ ] Correct fixture notes
- [ ] Document actual formula
- [ ] Update expected match rates
- [ ] Run integration tests

### Success Metrics

- [ ] Hash match rate >= 95% (vs actual ~0%)
- [ ] Hashes match legacy para mismo input
- [ ] Data integrity maintained (100%)
- [ ] All 8 tests passing
- [ ] No regressions in existing 54 tests

### Rollback Plan

If hash fix causes issues:
1. Revert lines 173-178 to original
2. Re-deploy previous version
3. Investigate with more sample data
4. Consider format-specific hash computation

---

## Fase 2: Data Validation (1.5 días)

### Objetivos

- [ ] Add 6 data validations
- [ ] Prevent invalid data from entering pipeline
- [ ] Achieve rejection rate < 0.1%
- [ ] Pass all 18 validation tests

### Complejidad

- **Nivel:** BAJA-MEDIA
- **Riesgo:** BAJO
- **Effort:** 1.5 días
- **Tests:** 18 nuevos tests

### Validaciones

| # | Validación | Priority | Tests | Effort |
|---|------------|----------|-------|--------|
| 1 | Required Fields | ⭐⭐⭐ ALTA | 5 | 0.5 días |
| 2 | NASD Fee | ⭐⭐⭐ ALTA | 3 | 0.1 días |
| 3 | Side Validation | ⭐⭐ MEDIA-ALTA | 3 | 0.25 días |
| 4 | Commission Fallback | ⭐⭐ MEDIA | 3 | 0.25 días |
| 5 | COVER Mapping | ⭐⭐ MEDIA | 2 | 0.1 días |
| 6 | Swap Calculation | ⭐ MEDIA | 2 | 0.25 días |

### Implementación

#### Paso 2.1: Required Fields Validation (0.5 días)

**File:** `propreports.py:183+`

```python
# After line 183 in parse_json_content
symbol = get_value("symbol", "Symbol")
date_time = get_value("date/time", "Date/Time")
price = get_value("price", "Price")
side = get_value("b/s", "B/S")

if not symbol or not date_time or not price or not side:
    logger.warning(
        f"[PROPREPORTS] Skipping order {order.get('propreports id', 'unknown')}: "
        f"missing required fields"
    )
    continue
```

**Tests:** 5 (all fields validation + individual field skips)

#### Paso 2.2: NASD Fee Missing (0.1 días)

**File:** `propreports.py:97-104`

```python
FEE_COLUMNS: ClassVar[list] = [
    "ecn fee", "sec", "orf", "cat", "taf", "nfa", "nscc", "nasd", "acc", "clr", "misc"
]  # Added "nasd"

FEE_COLUMNS_ORIGINAL: ClassVar[list] = [
    "Ecn Fee", "SEC", "ORF", "CAT", "TAF", "NFA", "NSCC", "NASD", "Acc", "Clr", "Misc"
]  # Added "NASD"
```

**Tests:** 3 (NASD in sum, fee calculation, all columns)

#### Paso 2.3: Side Validation (0.25 días)

**File:** `propreports.py:310+`

```python
# In normalize() after side mapping
.with_columns([
    pl.col("b/s").map_dict(cls.SIDE_MAP, default="INVALID").alias("side"),
])
.filter(pl.col("side").is_in(["BUY", "SELL"]))
```

**Tests:** 3 (valid sides pass, invalid filtered, only BUY/SELL)

#### Paso 2.4: Commission Fallback (0.25 días)

**File:** `propreports.py:330+`

```python
# Commission with exec fallback
pl.when(
    (pl.col("comm").is_null()) | (pl.col("comm") == 0) | (pl.col("comm") == "")
).then(
    pl.coalesce(pl.col("exec"), pl.col("Exec"), pl.lit(0.0))
).otherwise(
    pl.coalesce(pl.col("comm"), pl.col("Comm"), pl.lit(0.0))
).abs().alias("commission")
```

**Tests:** 3 (exec fallback, comm preferred, zero handling)

#### Paso 2.5: COVER Mapping (0.1 días)

**File:** `propreports.py:90-94`

```python
SIDE_MAP: ClassVar[dict] = {
    "B": "BUY",
    "BUY": "BUY",
    "COVER": "BUY",  # Add this
    "S": "SELL",
    "SELL": "SELL",
    "SHORT": "SELL",
    "T": "SELL",
}
```

**Tests:** 2 (COVER maps to BUY, SIDE_MAP contains COVER)

#### Paso 2.6: Swap Calculation (0.25 días - OPCIONAL)

**Decision Required:** Clarify with stakeholder if swap should be ECN * -1

**If YES:**
```python
# Add to parse_json_content records
"_ecn_fee": ecn_value  # For later swap calculation

# In normalize()
pl.col("_ecn_fee").mul(-1.0).alias("swap")
```

**If NO:**
- Document que cambio es intencional
- ECN stays in fees, swap = 0.0

**Tests:** 2 (swap calculation or ECN in fees)

### Success Metrics

- [ ] All 18 validation tests passing
- [ ] Rejection rate < 0.1%
- [ ] No false positives (valid data rejected)
- [ ] Meaningful warning messages in logs
- [ ] No regressions in existing tests

---

## Fase 3: CSV Support (0.5-4 días - CONDICIONAL)

### Decision Gate

**BLOQUEADOR:** Ejecutar SQL query ANTES de implementar

```sql
SELECT
    COUNT(DISTINCT user_id) as users,
    COUNT(*) as imports,
    MAX(created_at) as last_import,
    ROUND(100.0 * COUNT(*) / (SELECT COUNT(*) FROM import_files WHERE broker_id = 240), 2) as csv_pct
FROM import_files
WHERE broker_id = 240
  AND file_name LIKE '%.csv'
  AND created_at >= DATE_SUB(NOW(), INTERVAL 12 MONTH);
```

**Criterio:**
- **CSV percentage > 5%:** IMPLEMENTAR (3-4 días)
- **CSV percentage < 5%:** OMITIR (0.5 días doc only)

### CSV Format Analysis

Legacy implementation has **3 different CSV parser formats** (676 líneas):

#### Format 1: getcsv - EQUITIES (Lines 24-281) - ~257 líneas

**Characteristics:**
- Identifier: 'EQUITIES' in content
- Split trades (entry + exit on separate rows)
- Header detection: Contains 'B/S' and 'DATE/TIME'

**Effort:** 1 día (if needed)

#### Format 2: getcsv2 - B/S + DATE/TIME (Lines 283-421) - ~138 líneas

**Characteristics:**
- Combined Date/Time column
- Fee columns (7-10)
- Call/Put field

**Effort:** 1 día (if needed)

#### Format 3: getcsv3 - ROUTE (Lines 423-676) - ~253 líneas

**Characteristics:**
- Route information
- Order Id, Fill Id fields
- Time-based ordering

**Effort:** 1 día (if needed)

### Implementation Plan (IF REQUIRED)

#### Paso 3.1: CSV Parser 1 (1 día)
- [ ] Implement parse_csv_equities() method
- [ ] Handle split trades
- [ ] Column mapping
- [ ] Tests (10+)

#### Paso 3.2: CSV Parser 2 (1 día)
- [ ] Implement parse_csv_bs_datetime() method
- [ ] Handle combined Date/Time
- [ ] Fee columns mapping
- [ ] Tests (10+)

#### Paso 3.3: CSV Parser 3 (1 día)
- [ ] Implement parse_csv_route() method
- [ ] Handle Route field
- [ ] Time-based ordering
- [ ] Tests (10+)

#### Paso 3.4: Auto-Detection (0.5 días)
- [ ] Update detector.py for CSV
- [ ] Priority order (EQUITIES > B/S+DATE/TIME > ROUTE)
- [ ] Fallback logic
- [ ] Tests (3+)

#### Paso 3.5: Documentation (0.5 días)
- [ ] Update README
- [ ] Integration tests
- [ ] Performance testing

### Implementation Plan (IF NOT REQUIRED)

#### Paso 3.6: Documentation Only (0.5 días)
- [ ] Document CSV not supported
- [ ] List 3 legacy formats (reference)
- [ ] Explain decision (CSV usage < 5%)
- [ ] Migration guide if needed

### Success Metrics (IF IMPLEMENTED)

- [ ] All 3 CSV formats supported
- [ ] Auto-detection accuracy >= 95%
- [ ] 33+ CSV tests passing
- [ ] Integration tests updated
- [ ] No regressions

---

## Testing Strategy

### Test Organization

```
tests/brokers/test_propreports.py (554 → ~700 líneas post-cambios)
├── TestPropreportsFormatDetection (existing)
├── TestPropreportsBasicNormalization (existing)
├── TestPropreportsFieldTransformations (existing)
├── TestPropreportsJSONParsing (existing)
├── TestPropreportsDataIntegrity (existing)
├── TestPropreportsHashCompatibility (NEW - Fase 1)
│   └── 8 tests
├── TestPropreportsDataValidation (NEW - Fase 2)
│   └── 18 tests
└── TestPropreportsCSVParsing (NEW - Fase 3, condicional)
    └── 33+ tests
```

### Test Coverage Goals

| Fase | Tests | Coverage Target |
|------|-------|-----------------|
| Current | 54 | ~85% |
| Post-Fase 1 | 62 | ~88% |
| Post-Fase 2 | 80 | ~90% |
| Post-Fase 3 | 113+ | ~92% |

### Running Tests

```bash
# All tests
pytest tests/brokers/test_propreports.py -v

# Specific phase
pytest tests/brokers/test_propreports.py::TestPropreportsHashCompatibility -v
pytest tests/brokers/test_propreports.py::TestPropreportsDataValidation -v

# With coverage
pytest tests/brokers/test_propreports.py \
  --cov=pipeline.p01_normalize.brokers.propreports \
  --cov-report=term-missing \
  --cov-report=html

# Coverage report
open htmlcov/index.html
```

### Performance Testing

```bash
# Benchmark processing
python -m pytest tests/brokers/test_propreports.py::test_large_file_performance -v -s

# Memory profiling
python -m memory_profiler tests/brokers/test_propreports_memory.py

# Load testing
python scripts/load_test_propreports.py --orders 10000 --iterations 100
```

---

## Deployment Checklist

### Pre-Deployment

#### Code Review
- [ ] All code changes reviewed
- [ ] No hardcoded credentials
- [ ] Error handling appropriate
- [ ] Logging statements meaningful
- [ ] Comments explain WHY not WHAT

#### Testing
- [ ] All unit tests passing (80+)
- [ ] Integration tests passing
- [ ] Performance tests acceptable
- [ ] Coverage >= 90%
- [ ] Manual testing completed

#### Documentation
- [ ] README updated
- [ ] CHANGELOG updated
- [ ] Migration guide created
- [ ] Known issues documented

### Deployment Steps

#### Step 1: Deploy to Staging (1 día)

```bash
# 1. Create deployment branch
git checkout -b deploy/propreports-hash-fix

# 2. Run all tests
pytest tests/brokers/test_propreports.py -v

# 3. Deploy to staging
./deploy.sh staging propreports

# 4. Monitor logs
tail -f logs/staging/propreports.log
```

**Checklist:**
- [ ] Code deployed to staging
- [ ] Staging tests passing
- [ ] Logs clean (no errors)
- [ ] Sample data processed correctly
- [ ] Hash match rate >= 95%

#### Step 2: Validate with Real Data (0.5 días)

```bash
# Process sample of real production data
python scripts/validate_propreports_hashes.py \
  --sample-size 1000 \
  --compare-legacy

# Expected output:
# Hash Match Rate: 98.2%
# Data Integrity: 100.0%
# Rejection Rate: 0.08%
```

**Checklist:**
- [ ] Real data processed successfully
- [ ] Hash match rate >= 95%
- [ ] No data corruption
- [ ] Rejection rate < 0.1%
- [ ] Performance acceptable

#### Step 3: Deploy to Production (0.5 días)

```bash
# 1. Create production deployment
git tag v1.1.0-propreports-hash-fix
git push origin v1.1.0-propreports-hash-fix

# 2. Deploy to production
./deploy.sh production propreports

# 3. Monitor for 24 hours
watch -n 300 'python scripts/monitor_propreports.py'
```

**Checklist:**
- [ ] Production deployment successful
- [ ] No errors in logs
- [ ] Hash match rate >= 95%
- [ ] No duplicate trades
- [ ] Performance acceptable
- [ ] Users not reporting issues

### Post-Deployment

#### Monitoring (1 week)

```bash
# Daily monitoring
python scripts/monitor_propreports.py --daily-report

# Metrics to track:
# - Hash match rate
# - Rejection rate
# - Processing time
# - Error rate
# - User complaints
```

**Checklist:**
- [ ] Day 1: No critical issues
- [ ] Day 3: Metrics stable
- [ ] Day 7: All metrics within targets
- [ ] User feedback positive
- [ ] No rollback required

#### Documentation

- [ ] Deployment notes updated
- [ ] Known issues documented
- [ ] Metrics baseline recorded
- [ ] Team notified
- [ ] User communication sent (if needed)

### Rollback Plan

If critical issues arise:

```bash
# 1. Immediate rollback
./rollback.sh production propreports v1.0.0

# 2. Investigate root cause
python scripts/debug_propreports_issue.py --incident-id XYZ

# 3. Fix and redeploy
# (Return to Step 1: Deploy to Staging)
```

**Triggers for Rollback:**
- Hash match rate < 90%
- Error rate > 1%
- Data corruption detected
- Performance degradation > 50%
- Critical user complaints

---

## Success Metrics Summary

### Fase 1: Critical Hash Fix

| Métrica | Target | Actual (Pre-Fix) | Expected (Post-Fix) |
|---------|--------|------------------|---------------------|
| Hash Match Rate | >= 95% | ~0% ⚠️⚠️⚠️ | >= 95% ✅ |
| Data Integrity | 100% | 100% ✅ | 100% ✅ |
| Tests Passing | 62/62 | 54/54 ✅ | 62/62 ✅ |
| Trade Duplication | 0% | 100% ⚠️⚠️⚠️ | 0% ✅ |

### Fase 2: Data Validation

| Métrica | Target | Expected |
|---------|--------|----------|
| Rejection Rate | < 0.1% | < 0.1% ✅ |
| False Positives | 0% | 0% ✅ |
| Tests Passing | 80/80 | 80/80 ✅ |
| Coverage | >= 90% | >= 90% ✅ |

### Fase 3: CSV Support (Condicional)

| Métrica | Target |
|---------|--------|
| Format Detection Accuracy | >= 95% |
| CSV Tests Passing | 113+/113+ |
| Performance | < 2x slowdown |

---

## Appendix

### A. File Locations

```
pipeline/p01_normalize/
├── brokers/propreports/
│   ├── __init__.py
│   ├── detector.py
│   └── propreports.py         ← MAIN IMPLEMENTATION
├── tests/brokers/
│   └── test_propreports.py    ← TESTS
└── old_code_from_legacy/
    ├── propreports_export.py  ← REFERENCE (API)
    └── brokers_propreports.py ← REFERENCE (CSV parsers)
```

### B. Key Line Numbers

**propreports.py:**
- Line 90-94: SIDE_MAP (add COVER)
- Line 97-104: FEE_COLUMNS (add NASD)
- Line 163-178: Hash computation (CRITICAL FIX)
- Line 183+: Required fields validation
- Line 310+: Side validation filter
- Line 330+: Commission fallback

**test_propreports.py:**
- Add `TestPropreportsHashCompatibility` class (8 tests)
- Add `TestPropreportsDataValidation` class (18 tests)
- Add `TestPropreportsCSVParsing` class (33+ tests, condicional)

### C. Dependencies

```python
# propreports.py
import polars as pl
import json
import hashlib
from typing import ClassVar, Set, List, Dict, Any
from datetime import datetime
import logging

# test_propreports.py
import pytest
import polars as pl
import json
import hashlib
from datetime import datetime
```

### D. Logging Guidelines

```python
# Error (datos inválidos - skip)
logger.warning(
    f"[PROPREPORTS] Skipping order {order_id}: "
    f"reason description"
)

# Info (decisión de negocio - skip)
logger.info(
    f"[PROPREPORTS] Skipping order {order_id}: "
    f"business rule description"
)

# Debug (flujo normal)
logger.debug(
    f"[PROPREPORTS] Processing order {order_id}: "
    f"details"
)
```

### E. Contact

**Questions or Issues:**
- Check documentation: `PLAN_ANALISIS_VALIDACIONES_PROPREPORTS.md`
- Review examples: `EJEMPLOS_CAMBIOS_CODIGO.md`
- Review changes log: `CAMBIOS_IMPLEMENTADOS.md`

---

**Última Actualización:** 2026-01-14
**Versión:** 1.0
**Status:** READY FOR IMPLEMENTATION - START WITH FASE 1 (CRÍTICO)
