# Cambios Implementados - Propreports Normalizer

## Estado General

**Fecha Inicio:** 2026-01-14
**Status:** 📋 PLANIFICADO - Pendiente implementación
**Broker:** Propreports (ID: 240)
**Hash Match Rate Actual:** ~0% ⚠️⚠️⚠️

---

## Progress Overview

| Fase | Descripción | Status | Tests | Effort |
|------|-------------|--------|-------|--------|
| **Fase 1** | Critical Hash Fix | ⏳ Pendiente | 0/8 | 1-2 días |
| **Fase 2** | Data Validation | ⏳ Pendiente | 0/18 | 1.5 días |
| **Fase 3** | CSV Support (Condicional) | ⏳ Pendiente | 0/33+ | 0.5-4 días |
| **TOTAL** | | **0% Completo** | **0/59+** | **3-7.5 días** |

---

## Fase 1: Critical Hash Fix (1-2 días) ⚠️⚠️⚠️

### Objetivo
Corregir hash match rate de ~0% a 95%+

### Tareas

#### 1.1 Remove Portfolio/Reference_Code from Hash
**Archivo:** `propreports.py:173-178`
**Status:** ⏳ Pendiente

- [ ] Remove lines 174-175 (portfolio/reference_code añadidos a hash)
- [ ] Mantener millisecond stripping (línea 171)
- [ ] Verificar que hash es legacy-compatible
- [ ] Documentar cambio en comentarios

**Código Actual (PROBLEMA):**
```python
# Lines 173-178
order_for_hash = dict(order)
order_for_hash[datetime_key] = dt_value.split('.')[0]
order_for_hash["portfolio"] = portfolio  # ← REMOVE
order_for_hash["reference_code"] = reference_code  # ← REMOVE
file_row_hash = hashlib.md5(json.dumps(order_for_hash).encode('utf-8')).hexdigest()
```

**Código Propuesto (SOLUCIÓN):**
```python
# Lines 173-178
order_for_hash = dict(order)
order_for_hash[datetime_key] = dt_value.split('.')[0]
# Portfolio and reference_code NOT included in hash (legacy compatibility)
file_row_hash = hashlib.md5(json.dumps(order_for_hash).encode('utf-8')).hexdigest()
```

#### 1.2 Add Hash Compatibility Tests
**Archivo:** `test_propreports.py`
**Status:** ⏳ Pendiente

- [ ] test_hash_without_portfolio_reference_code()
- [ ] test_hash_matches_legacy_format()
- [ ] test_hash_deterministic()
- [ ] test_hash_milliseconds_stripped()
- [ ] test_hash_json_key_order()
- [ ] test_hash_with_sample_data()
- [ ] test_hash_user_4359_compatibility()
- [ ] test_hash_user_40888_compatibility()

**Total Tests:** 8

#### 1.3 Manual Testing with Legacy Data
**Status:** ⏳ Pendiente

- [ ] Ejecutar Query 1 (verify portfolio in legacy data)
- [ ] Obtener sample orders de user 4359
- [ ] Obtener sample orders de user 40888
- [ ] Comparar hashes manualmente
- [ ] Validar match rate >= 95%

#### 1.4 Update Fixture Notes
**Archivo:** `tests/integration/fixtures/propreports.json`
**Status:** ⏳ Pendiente

- [ ] Corregir notes line 17 (quitar "100% match")
- [ ] Documentar fórmula real: "strip ms, MD5(order WITHOUT portfolio/ref)"
- [ ] Actualizar expected_executions
- [ ] Actualizar expected_trades

---

## Fase 2: Data Validation (1.5 días)

### Objetivo
Prevenir datos inválidos y mejorar compatibility

### Tareas

#### 2.1 Required Fields Validation
**Archivo:** `propreports.py:183+`
**Status:** ⏳ Pendiente

- [ ] Validar symbol not empty
- [ ] Validar date_time not empty
- [ ] Validar price not empty/zero
- [ ] Validar side not empty
- [ ] Add logger.warning para skips
- [ ] Test: test_required_fields_validation()
- [ ] Test: test_empty_symbol_skipped()
- [ ] Test: test_empty_date_skipped()
- [ ] Test: test_empty_price_skipped()
- [ ] Test: test_empty_side_skipped()

**Total Tests:** 5

**Código Propuesto:**
```python
# propreports.py después de línea 183
symbol = get_value("symbol", "Symbol")
date_time = get_value("date/time", "Date/Time")
price = get_value("price", "Price")
side = get_value("b/s", "B/S")

if not symbol or not date_time or not price or not side:
    logger.warning(
        f"[PROPREPORTS] Skipping order {order.get('propreports id', 'unknown')}: "
        f"missing required fields (symbol={bool(symbol)}, date={bool(date_time)}, "
        f"price={bool(price)}, side={bool(side)})"
    )
    continue
```

#### 2.2 NASD Fee Missing from Sum
**Archivo:** `propreports.py:97-104`
**Status:** ⏳ Pendiente

- [ ] Add "nasd" to FEE_COLUMNS (línea 99)
- [ ] Add "NASD" to FEE_COLUMNS_ORIGINAL (línea 103)
- [ ] Verificar fee sum includes NASD
- [ ] Test: test_nasd_fee_included_in_sum()
- [ ] Test: test_fee_calculation_with_nasd()
- [ ] Test: test_all_fee_columns_summed()

**Total Tests:** 3

**Código Propuesto:**
```python
FEE_COLUMNS: ClassVar[list] = [
    "ecn fee", "sec", "orf", "cat", "taf", "nfa", "nscc", "nasd", "acc", "clr", "misc"
]  # Added "nasd"

FEE_COLUMNS_ORIGINAL: ClassVar[list] = [
    "Ecn Fee", "SEC", "ORF", "CAT", "TAF", "NFA", "NSCC", "NASD", "Acc", "Clr", "Misc"
]  # Added "NASD"
```

#### 2.3 Side Validation Post-Mapping
**Archivo:** `propreports.py:310+`
**Status:** ⏳ Pendiente

- [ ] Add .filter() after side mapping
- [ ] Filter to BUY/SELL only
- [ ] Log invalid sides
- [ ] Test: test_side_validation_post_mapping()
- [ ] Test: test_invalid_sides_filtered()
- [ ] Test: test_only_buy_sell_pass()

**Total Tests:** 3

**Código Propuesto:**
```python
# propreports.py en normalize() después de side mapping (~línea 310)
.with_columns([
    # ... existing side mapping ...
    pl.col("b/s").map_dict(cls.SIDE_MAP, default="INVALID").alias("side"),
])
.filter(pl.col("side").is_in(["BUY", "SELL"]))  # Add this filter
```

#### 2.4 Commission Column Fallback
**Archivo:** `propreports.py:330+`
**Status:** ⏳ Pendiente

- [ ] Modificar commission calculation
- [ ] Add fallback a 'exec' si 'comm' zero/empty
- [ ] Test: test_commission_exec_fallback()
- [ ] Test: test_commission_comm_preferred()
- [ ] Test: test_commission_exec_when_comm_empty()

**Total Tests:** 3

**Código Propuesto:**
```python
# propreports.py línea ~330
# Commission calculation with exec fallback
pl.when(
    (pl.col("comm").is_null()) | (pl.col("comm") == 0) | (pl.col("comm") == "")
).then(
    pl.coalesce(pl.col("exec"), pl.col("Exec"), pl.lit(0.0))
).otherwise(
    pl.coalesce(pl.col("comm"), pl.col("Comm"), pl.lit(0.0))
).abs().alias("commission")
```

#### 2.5 "COVER" Side Mapping
**Archivo:** `propreports.py:90-94`
**Status:** ⏳ Pendiente

- [ ] Add "COVER": "BUY" to SIDE_MAP
- [ ] Test: test_cover_side_mapping()
- [ ] Test: test_cover_maps_to_buy()

**Total Tests:** 2

**Código Propuesto:**
```python
SIDE_MAP: ClassVar[dict] = {
    "B": "BUY",
    "BUY": "BUY",
    "COVER": "BUY",  # Add this
    "S": "SELL",
    "SELL": "SELL",
    "SHORT": "SELL",
    "T": "SELL",
}
```

#### 2.6 Swap Calculation from ECN (OPCIONAL)
**Archivo:** `propreports.py:360+`
**Status:** ⏳ Pendiente (si cambio necesario)

- [ ] Clarificar con stakeholder si swap debe ser ecn*-1
- [ ] Si YES: Modificar swap calculation
- [ ] Si NO: Documentar que cambio es intencional
- [ ] Test: test_swap_calculation() (si implementado)
- [ ] Test: test_ecn_in_fees() (si NO cambio)

**Total Tests:** 2

**Nota:** ECN ahora está en "fees", no en swap. Verificar si esto es intencional.

---

## Fase 3: CSV Support (0.5-4 días - CONDICIONAL) 🎯

### Decisión Gate

#### SQL Query Execution
**Status:** ⏳ Pendiente

- [ ] Ejecutar Query 2 (CSV usage check)
- [ ] Analizar resultados
- [ ] Decisión: CSV > 5% ? IMPLEMENTAR : OMITIR

**SQL Query:**
```sql
SELECT
    COUNT(DISTINCT user_id) as users,
    COUNT(*) as imports,
    MAX(created_at) as last_import,
    ROUND(100.0 * COUNT(*) / (SELECT COUNT(*) FROM import_files WHERE broker_id = 240), 2) as csv_pct
FROM import_files
WHERE broker_id = 240
  AND file_name LIKE '%.csv'
  AND created_at >= DATE_SUB(NOW(), INTERVAL 12 MONTH);
```

### Tareas (SI CSV > 5%)

#### 3.1 CSV Format 1 - EQUITIES
**Archivo:** `propreports.py` (new method)
**Status:** ⏳ Pendiente

- [ ] Implement parse_csv_equities() method
- [ ] Handle split trades (entry + exit on separate rows)
- [ ] Column mapping
- [ ] Tests (10+)

**Estimado:** 1 día

#### 3.2 CSV Format 2 - B/S + DATE/TIME
**Archivo:** `propreports.py` (new method)
**Status:** ⏳ Pendiente

- [ ] Implement parse_csv_bs_datetime() method
- [ ] Handle combined Date/Time column
- [ ] Fee columns (7-10)
- [ ] Tests (10+)

**Estimado:** 1 día

#### 3.3 CSV Format 3 - ROUTE
**Archivo:** `propreports.py` (new method)
**Status:** ⏳ Pendiente

- [ ] Implement parse_csv_route() method
- [ ] Handle Route information
- [ ] Time-based ordering
- [ ] Tests (10+)

**Estimado:** 1 día

#### 3.4 CSV Auto-Detection
**Archivo:** `detector.py`
**Status:** ⏳ Pendiente

- [ ] Update can_handle() for CSV detection
- [ ] Priority order (EQUITIES > B/S+DATE/TIME > ROUTE)
- [ ] Fallback logic
- [ ] Tests (3+)

**Estimado:** 0.5 días

#### 3.5 CSV Integration Testing
**Status:** ⏳ Pendiente

- [ ] Integration tests para cada formato
- [ ] Performance testing
- [ ] Documentation
- [ ] Update baselines

**Estimado:** 0.5 días

### Tareas (SI CSV < 5%)

#### 3.6 Documentation Only
**Archivo:** `README.md`, `PLAN_*.md`
**Status:** ⏳ Pendiente

- [ ] Documentar que CSV no está soportado
- [ ] Listar 3 formats legacy (referencia)
- [ ] Explicar decisión (CSV usage < 5%)
- [ ] Migration guide si usuarios necesitan CSV

**Estimado:** 0.5 días

---

## Testing Summary

### Unit Tests Required

| Categoría | Tests | Status |
|-----------|-------|--------|
| **Hash Compatibility** | 8 | ⏳ 0/8 |
| **Required Fields** | 5 | ⏳ 0/5 |
| **NASD Fee** | 3 | ⏳ 0/3 |
| **Side Validation** | 3 | ⏳ 0/3 |
| **Commission** | 3 | ⏳ 0/3 |
| **COVER Mapping** | 2 | ⏳ 0/2 |
| **Swap (opcional)** | 2 | ⏳ 0/2 |
| **CSV (condicional)** | 33+ | ⏳ 0/33+ |
| **TOTAL** | **59+** | **⏳ 0/59+** |

### Integration Tests

- [ ] Update fixtures/propreports.json
- [ ] Update baselines/propreports.json
- [ ] Run full integration test suite
- [ ] Verify hash match rate >= 95%
- [ ] Verify rejection rate < 0.1%

---

## Métricas de Progreso

### Code Changes

| Archivo | Líneas Actual | Líneas Estimadas | Delta |
|---------|---------------|------------------|-------|
| propreports.py | 423 | ~470-490 | +47-67 |
| detector.py | 32 | ~40-50 (si CSV) | +8-18 |
| test_propreports.py | 554 | ~650-700 | +96-146 |
| **TOTAL** | **1,009** | **1,160-1,240** | **+151-231** |

### Hash Match Rate Progress

| Milestone | Target | Actual | Status |
|-----------|--------|--------|--------|
| **Pre-Fix** | N/A | ~0% | ⚠️⚠️⚠️ |
| **Post-Fase 1** | >= 95% | TBD | ⏳ |
| **Post-Fase 2** | >= 95% | TBD | ⏳ |
| **Post-Fase 3** | >= 95% | TBD | ⏳ |

### Data Validation Progress

| Validación | Status |
|------------|--------|
| Required Fields | ⏳ Pendiente |
| NASD Fee | ⏳ Pendiente |
| Side Validation | ⏳ Pendiente |
| Commission Fallback | ⏳ Pendiente |
| COVER Mapping | ⏳ Pendiente |
| Swap Calculation | ⏳ Pendiente |

---

## Deployment Checklist

### Pre-Deployment

- [ ] All unit tests passing (59+)
- [ ] Integration tests passing
- [ ] Hash match rate >= 95%
- [ ] Rejection rate < 0.1%
- [ ] Code review completed
- [ ] Documentation updated

### Staging Deployment

- [ ] Deploy to staging
- [ ] Run smoke tests
- [ ] Process sample data (user 4359)
- [ ] Verify hash matches
- [ ] Monitor logs (24 hours)

### Production Deployment

- [ ] Deploy to production
- [ ] Monitor hash match rate
- [ ] Monitor rejection rate
- [ ] Check for duplicates
- [ ] User feedback
- [ ] Performance metrics

---

## Rollback Plan

Si problemas críticos:
- [ ] Identificar issue
- [ ] Rollback a versión anterior
- [ ] Investigar root cause
- [ ] Fix y re-test en staging
- [ ] Re-deploy

---

## Notas de Implementación

### Prioridades

1. **CRÍTICO:** Fase 1 (hash fix) - DEBE completarse antes de cualquier deployment
2. **ALTA:** Fase 2 (validations) - DEBE completarse antes de production
3. **MEDIA:** Fase 3 (CSV) - SOLO si query muestra necesidad

### Riesgos Identificados

1. **Hash incompatibility** - Mitigado con testing exhaustivo
2. **NASD fee missing** - Afecta accuracy financiera
3. **CSV support gap** - Mitigado con SQL query decision gate
4. **Commission fallback** - Puede afectar comisiones zero
5. **Swap calculation change** - Requiere clarificación business logic

### Dependencias

- SQL database access para queries de investigación
- Legacy data samples (users 4359, 40888)
- Stakeholder input para swap calculation
- CSV usage statistics para Fase 3 decision

---

**Última Actualización:** 2026-01-14
**Próxima Revisión:** Post-Fase 1
**Responsable:** Development Team
**Status:** 📋 READY TO START - FASE 1 PRIMERO
