# Ejemplos de Cambios de Código - Deribit Normalizer

Este documento contiene ejemplos before/after de todos los cambios de código requeridos para el Deribit normalizer.

**Broker ID:** deribit
**Formato:** JSON API (crypto perpetuals + European-style options)
**Hash Match Rate Actual:** 74.27% → **Objetivo:** 95%+

---

## FASE 1: Critical Hash Fix (1-2 días) ⚠️ URGENTE

### ⚠️⚠️⚠️ 1. Expired Options Hash Mismatch (CRÍTICO)

**Archivo:** `brokers/deribit/deribit.py`
**Líneas:** 199-202
**Criticidad:** ALTA

**Problema:**
- Hash match rate: 74.27% (1,146/1,543 matches)
- 397 expired options (25.73%) usan diferente hash formula con "_EXP_A" suffix
- Riesgo: Expired options pueden duplicarse en re-import

#### BEFORE (Código Actual)

```python
# deribit.py líneas 199-202
# Compute file_row hash using legacy formula:
# MD5(json.dumps(id)) - no EXP_A suffix for Deribit
trade_id_val = trade.get("id", "")
file_row_hash = hashlib.md5(json.dumps(trade_id_val).encode('utf-8')).hexdigest()
```

**Problema:** No maneja expired options con "_EXP_A" suffix

#### AFTER (Código Propuesto)

```python
# deribit.py líneas 199-220
def _compute_file_row_hash(cls, order_id: str, instrument_name: str) -> str:
    """
    Compute file_row hash with expired options handling.

    For expired options, append "_EXP_A" suffix to match legacy.

    Args:
        order_id: Order ID from Deribit API
        instrument_name: Instrument name (e.g., "BTC-26JUL24-56000-P")

    Returns:
        MD5 hash string
    """
    # Check if instrument is expired option
    # Pattern: BTC-DDMMMYY-STRIKE-C/P
    if cls._is_option(instrument_name):
        parts = instrument_name.split('-')
        if len(parts) == 4:
            expiry_str = parts[1]  # e.g., "26JUL24"
            try:
                # Parse expiry date in DDMMMYY format
                expiry_date = datetime.strptime(expiry_str, '%d%b%y')
                # Check if expired
                if expiry_date < datetime.now():
                    order_id = order_id + "_EXP_A"
                    logger.debug(f"[DERIBIT] Added EXP_A suffix for expired option: {instrument_name}")
            except ValueError as e:
                logger.warning(f"[DERIBIT] Failed to parse expiry date '{expiry_str}': {e}")
                pass  # Not a valid date, skip suffix

    return hashlib.md5(json.dumps(order_id).encode('utf-8')).hexdigest()

# Usage in parse_json_content (línea ~202)
trade_id_val = trade.get("id", "")
file_row_hash = cls._compute_file_row_hash(str(trade_id_val), instrument)
```

**Beneficios:**
- Hash match rate esperado: 95%+ (vs actual 74.27%)
- Expired options correctamente deduplicados
- Active options sin cambios
- Data integrity mantenida

#### Test Cases

```python
# tests/brokers/test_deribit.py

def test_expired_option_hash_suffix():
    """Verifica que expired options usan _EXP_A suffix"""
    # Arrange: Expired option (26JUL24 in the past)
    order_id = "596504147"
    instrument = "BTC-26JUL24-56000-P"  # Expired option

    # Act
    hash_result = DeribitInterpreter._compute_file_row_hash(order_id, instrument)

    # Assert: Should match legacy formula with suffix
    expected = hashlib.md5(json.dumps("596504147_EXP_A").encode('utf-8')).hexdigest()
    assert hash_result == expected

def test_active_option_hash_no_suffix():
    """Verifica que active options NO usan suffix"""
    # Arrange: Active option (26DEC26 in the future)
    order_id = "596504147"
    instrument = "BTC-26DEC26-80000-C"  # Future expiry

    # Act
    hash_result = DeribitInterpreter._compute_file_row_hash(order_id, instrument)

    # Assert: Should NOT have suffix
    expected = hashlib.md5(json.dumps("596504147").encode('utf-8')).hexdigest()
    assert hash_result == expected

def test_perpetual_hash_no_suffix():
    """Verifica que perpetuals NO usan suffix"""
    # Arrange: Perpetual (not an option)
    order_id = "3877357"
    instrument = "ETH_USDT"  # Perpetual

    # Act
    hash_result = DeribitInterpreter._compute_file_row_hash(order_id, instrument)

    # Assert: Should NOT have suffix
    expected = hashlib.md5(json.dumps("3877357").encode('utf-8')).hexdigest()
    assert hash_result == expected

def test_expired_option_integration():
    """Integration test: expired option correctamente procesada"""
    # Arrange: Sample expired option trade
    trade = {
        "id": 596504147,
        "type": "trade",
        "instrument_name": "ETH-30DEC22-3600-P",  # Expired in 2022
        "side": "liquidation buy",
        "price": 0.0007,
        "amount": 2.0,
        "contracts": 2.0,
        "timestamp": 1671469172919,
        "commission": 0.00035,
        "currency": "ETH",
    }

    # Act
    df = DeribitInterpreter.parse_json_content(json.dumps([trade]))

    # Assert
    assert df.height == 1
    result = df.collect()

    # file_row should have EXP_A suffix
    expected_hash = hashlib.md5(json.dumps("596504147_EXP_A").encode('utf-8')).hexdigest()
    assert result["file_row"][0] == expected_hash
```

**Coverage:**
- ✅ Expired options (past dates)
- ✅ Active options (future dates)
- ✅ Perpetuals (non-options)
- ✅ Invalid date formats
- ✅ Integration test con datos reales

---

## FASE 2: Data Validation (1-1.5 días)

### ⭐⭐⭐ 2. Symbol Validation (HIGH)

**Archivo:** `brokers/deribit/deribit.py`
**Líneas:** ~164 (después de instrument assignment)
**Criticidad:** ALTA

#### BEFORE (Sin Validación)

```python
# deribit.py línea 164
instrument = str(trade.get("instrument_name", ""))
is_option = cls._is_option(instrument)

# NO VALIDATION - empty symbols pueden entrar
```

**Problema:** Symbols vacíos pueden entrar al sistema y causar errores en stages posteriores

#### AFTER (Con Validación)

```python
# deribit.py líneas 164-169
instrument = str(trade.get("instrument_name", ""))

# Validate instrument name is not empty
if not instrument or not instrument.strip():
    logger.warning(f"[DERIBIT] Skipping trade {trade.get('id', 'unknown')}: empty instrument_name")
    continue

is_option = cls._is_option(instrument)
```

#### Test Case

```python
# tests/brokers/test_deribit.py

def test_symbol_empty_validation():
    """Verifica que symbols vacíos son rechazados"""
    # Arrange: Trade con symbol vacío
    trade = {
        "id": 12345,
        "type": "trade",
        "instrument_name": "",  # Empty symbol
        "side": "open buy",
        "price": 2500.0,
        "amount": 10.0,
        "timestamp": 1751852373399,
        "commission": 0.0,
        "currency": "USDT",
    }

    # Act
    df = DeribitInterpreter.parse_json_content(json.dumps([trade]))

    # Assert: Should be rejected
    assert df.height == 0

def test_symbol_whitespace_validation():
    """Verifica que symbols con solo whitespace son rechazados"""
    # Arrange: Trade con symbol whitespace
    trade = {
        "id": 12345,
        "type": "trade",
        "instrument_name": "   ",  # Whitespace only
        "side": "open buy",
        "price": 2500.0,
        "amount": 10.0,
        "timestamp": 1751852373399,
        "commission": 0.0,
        "currency": "USDT",
    }

    # Act
    df = DeribitInterpreter.parse_json_content(json.dumps([trade]))

    # Assert: Should be rejected
    assert df.height == 0
```

---

### ⭐⭐⭐ 3. Price Validation (HIGH)

**Archivo:** `brokers/deribit/deribit.py`
**Líneas:** ~177 (después de price assignment)
**Criticidad:** ALTA

#### BEFORE (Sin Validación)

```python
# deribit.py línea 177
price = float(trade.get("price", 0) or 0)

# NO VALIDATION - zero/negative prices pueden entrar
```

**Problema:** Prices zero/negativos causan errores en cálculos posteriores

#### AFTER (Con Validación)

```python
# deribit.py líneas 177-182
price = float(trade.get("price", 0) or 0)

# Validate price is positive
if price <= 0:
    logger.warning(f"[DERIBIT] Skipping trade {trade.get('id', 'unknown')}: "
                   f"zero or negative price {price}")
    continue
```

#### Test Cases

```python
# tests/brokers/test_deribit.py

def test_price_zero_validation():
    """Verifica que prices zero son rechazados"""
    # Arrange: Trade con price zero
    trade = {
        "id": 12345,
        "type": "trade",
        "instrument_name": "BTC_USDT",
        "side": "open buy",
        "price": 0.0,  # Zero price
        "amount": 10.0,
        "timestamp": 1751852373399,
        "commission": 0.0,
        "currency": "USDT",
    }

    # Act
    df = DeribitInterpreter.parse_json_content(json.dumps([trade]))

    # Assert: Should be rejected
    assert df.height == 0

def test_price_negative_validation():
    """Verifica que prices negativos son rechazados"""
    # Arrange: Trade con price negativo
    trade = {
        "id": 12345,
        "type": "trade",
        "instrument_name": "BTC_USDT",
        "side": "open buy",
        "price": -2500.0,  # Negative price
        "amount": 10.0,
        "timestamp": 1751852373399,
        "commission": 0.0,
        "currency": "USDT",
    }

    # Act
    df = DeribitInterpreter.parse_json_content(json.dumps([trade]))

    # Assert: Should be rejected
    assert df.height == 0
```

---

### ⭐⭐ 4. Quantity Validation (MEDIUM-HIGH)

**Archivo:** `brokers/deribit/deribit.py`
**Líneas:** ~175-183 (después de amount/contracts calculation)
**Criticidad:** MEDIA-ALTA

#### BEFORE (Sin Validación)

```python
# deribit.py líneas 175-183
contracts = trade.get("contracts")
amount = float(trade.get("amount", 0) or 0)

if contracts is not None:
    quantity = float(contracts)
else:
    quantity = amount

# NO VALIDATION - zero quantities pueden entrar
```

**Problema:** Orders con zero quantity contaminan datos

#### AFTER (Con Validación)

```python
# deribit.py líneas 175-189
contracts = trade.get("contracts")
amount = float(trade.get("amount", 0) or 0)

if contracts is not None:
    quantity = float(contracts)
else:
    quantity = amount

# Validate quantity is positive
if quantity <= 0:
    logger.warning(f"[DERIBIT] Skipping trade {trade.get('id', 'unknown')}: "
                   f"zero quantity (amount={amount}, contracts={contracts})")
    continue
```

#### Test Cases

```python
# tests/brokers/test_deribit.py

def test_quantity_zero_validation():
    """Verifica que quantities zero son rechazados"""
    # Arrange: Trade con amount=0 and contracts=None
    trade = {
        "id": 12345,
        "type": "trade",
        "instrument_name": "BTC_USDT",
        "side": "open buy",
        "price": 2500.0,
        "amount": 0.0,      # Zero amount
        "contracts": None,  # No contracts
        "timestamp": 1751852373399,
        "commission": 0.0,
        "currency": "USDT",
    }

    # Act
    df = DeribitInterpreter.parse_json_content(json.dumps([trade]))

    # Assert: Should be rejected
    assert df.height == 0

def test_quantity_negative_validation():
    """Verifica que quantities negativos son rechazados"""
    # Arrange: Trade con amount negativo
    trade = {
        "id": 12345,
        "type": "trade",
        "instrument_name": "BTC_USDT",
        "side": "open buy",
        "price": 2500.0,
        "amount": -10.0,    # Negative amount
        "timestamp": 1751852373399,
        "commission": 0.0,
        "currency": "USDT",
    }

    # Act
    df = DeribitInterpreter.parse_json_content(json.dumps([trade]))

    # Assert: Should be rejected
    assert df.height == 0
```

---

### ⭐⭐ 5. Timestamp Validation (MEDIUM)

**Archivo:** `brokers/deribit/deribit.py`
**Líneas:** ~210 (antes de timestamp conversion)
**Criticidad:** MEDIA

#### BEFORE (Sin Validación)

```python
# deribit.py línea 210
timestamp_val = trade.get("timestamp", 0)
try:
    timestamp_ms = int(timestamp_val)
    # Convert ms to datetime in UTC
    utc_dt = datetime.utcfromtimestamp(timestamp_ms / 1000)
```

**Problema:** Timestamps missing (0) generan epoch date (1970-01-01)

#### AFTER (Con Validación)

```python
# deribit.py líneas 210-217
timestamp_val = trade.get("timestamp")

# Validate timestamp exists and is not zero
if not timestamp_val or timestamp_val == 0:
    logger.warning(f"[DERIBIT] Skipping trade {trade.get('id', 'unknown')}: "
                   f"missing timestamp")
    continue

try:
    timestamp_ms = int(timestamp_val)
    # Convert ms to datetime in UTC
    utc_dt = datetime.utcfromtimestamp(timestamp_ms / 1000)
```

#### Test Cases

```python
# tests/brokers/test_deribit.py

def test_timestamp_missing_validation():
    """Verifica que timestamps missing son rechazados"""
    # Arrange: Trade sin timestamp
    trade = {
        "id": 12345,
        "type": "trade",
        "instrument_name": "BTC_USDT",
        "side": "open buy",
        "price": 2500.0,
        "amount": 10.0,
        "timestamp": None,  # Missing timestamp
        "commission": 0.0,
        "currency": "USDT",
    }

    # Act
    df = DeribitInterpreter.parse_json_content(json.dumps([trade]))

    # Assert: Should be rejected
    assert df.height == 0

def test_timestamp_zero_validation():
    """Verifica que timestamps zero son rechazados"""
    # Arrange: Trade con timestamp=0
    trade = {
        "id": 12345,
        "type": "trade",
        "instrument_name": "BTC_USDT",
        "side": "open buy",
        "price": 2500.0,
        "amount": 10.0,
        "timestamp": 0,  # Zero timestamp (epoch)
        "commission": 0.0,
        "currency": "USDT",
    }

    # Act
    df = DeribitInterpreter.parse_json_content(json.dumps([trade]))

    # Assert: Should be rejected
    assert df.height == 0
```

---

### ⭐ 7. Side Mapping Completeness (MEDIUM)

**Archivo:** `brokers/deribit/deribit.py`
**Líneas:** 238-240
**Criticidad:** MEDIA

#### BEFORE (Sin Documentación Completa)

```python
# deribit.py líneas 238-240
# Map side to action (BUY/SELL)
side_raw = str(trade.get("side", "")).lower()
action = "BUY" if "buy" in side_raw else "SELL"
```

**Problema:**
- No documenta 6 compound side values
- No valida unknown sides

#### AFTER (Con Documentación y Validación)

```python
# deribit.py líneas 238-247
# Map compound side values to BUY/SELL
# Deribit uses 6 compound side values:
# - "open buy", "close short", "liquidation buy" → BUY
# - "open sell", "close long", "liquidation sell" → SELL
side_raw = str(trade.get("side", "")).lower()
action = "BUY" if any(x in side_raw for x in ["buy", "short"]) else \
         "SELL" if any(x in side_raw for x in ["sell", "long"]) else ""

# Validate side mapping was successful
if not action:
    logger.warning(f"[DERIBIT] Skipping trade {trade.get('id', 'unknown')}: "
                   f"unknown side '{side_raw}'")
    continue
```

#### Test Cases

```python
# tests/brokers/test_deribit.py

def test_side_mapping_all_combinations():
    """Verifica que todos los 6 compound sides mapean correctamente"""
    test_cases = [
        ("open buy", "BUY"),
        ("close buy", "BUY"),      # close short in legacy
        ("liquidation buy", "BUY"),
        ("open sell", "SELL"),
        ("close sell", "SELL"),    # close long in legacy
        ("liquidation sell", "SELL"),
    ]

    for side_raw, expected_side in test_cases:
        # Arrange
        trade = {
            "id": 12345,
            "type": "trade",
            "instrument_name": "BTC_USDT",
            "side": side_raw,
            "price": 2500.0,
            "amount": 10.0,
            "timestamp": 1751852373399,
            "commission": 0.0,
            "currency": "USDT",
        }

        # Act
        df = DeribitInterpreter.parse_json_content(json.dumps([trade]))

        # Assert
        assert df.height == 1, f"Side '{side_raw}' was rejected"
        result = df.collect()
        assert result["side"][0] == expected_side, \
            f"Side '{side_raw}' mapped to '{result['side'][0]}', expected '{expected_side}'"

def test_side_mapping_unknown():
    """Verifica que unknown sides son rechazados"""
    # Arrange: Trade con side desconocido
    trade = {
        "id": 12345,
        "type": "trade",
        "instrument_name": "BTC_USDT",
        "side": "unknown_side",  # Unknown side
        "price": 2500.0,
        "amount": 10.0,
        "timestamp": 1751852373399,
        "commission": 0.0,
        "currency": "USDT",
    }

    # Act
    df = DeribitInterpreter.parse_json_content(json.dumps([trade]))

    # Assert: Should be rejected
    assert df.height == 0
```

---

## Validaciones Ya Implementadas ✅

### ✅ 14-22. No Changes Required

Estas validaciones ya están correctamente implementadas:

- **Trade Type Filtering** (deribit.py:160-162)
- **JSON Validation** (parse_json_content)
- **Timestamp Conversion** (deribit.py:210-227)
- **Symbol Normalization** (deribit.py:262-268)
- **Option Parsing** (deribit.py:110-127)
- **Asset Type Classification** (deribit.py:447-450)
- **Commission Calculation** (deribit.py:430-435)
- **Pip Value Calculation** (deribit.py:185-197)
- **Column Detection** (detector.py:21)

**Recomendación:** No modificar estas implementaciones - funcionan correctamente.

---

## Resumen de Cambios

### Archivos Modificados

1. **`brokers/deribit/deribit.py`**
   - Agregar método `_compute_file_row_hash()` con expired options logic
   - Agregar symbol validation (empty check)
   - Agregar price validation (zero/negative check)
   - Agregar quantity validation (zero check)
   - Agregar timestamp validation (missing check)
   - Mejorar side mapping documentation y validation

2. **`tests/brokers/test_deribit.py`**
   - Agregar test_expired_option_hash_suffix
   - Agregar test_active_option_hash_no_suffix
   - Agregar test_perpetual_hash_no_suffix
   - Agregar test_expired_option_integration
   - Agregar test_symbol_empty_validation
   - Agregar test_symbol_whitespace_validation
   - Agregar test_price_zero_validation
   - Agregar test_price_negative_validation
   - Agregar test_quantity_zero_validation
   - Agregar test_quantity_negative_validation
   - Agregar test_timestamp_missing_validation
   - Agregar test_timestamp_zero_validation
   - Agregar test_side_mapping_all_combinations
   - Agregar test_side_mapping_unknown

### Líneas Estimadas

- **Production Code:** ~60-80 líneas nuevas
- **Test Code:** ~200-250 líneas nuevas
- **Total:** ~260-330 líneas

### Impacto

- **Hash Match Rate:** 74.27% → 95%+ ✅
- **Data Quality:** Previene ~5-10 rejection scenarios
- **Test Coverage:** 616 → ~850 líneas (comprehensive)
- **Maintainability:** Mejor documentación y validaciones

---

## Verificación Post-Implementación

### Unit Tests
```bash
# Run all tests
pytest tests/brokers/test_deribit.py -v

# Run specific test categories
pytest tests/brokers/test_deribit.py::test_expired_option_hash_suffix -v
pytest tests/brokers/test_deribit.py::test_symbol_empty_validation -v
pytest tests/brokers/test_deribit.py::test_price_zero_validation -v
```

### Integration Tests
```bash
# Process sample data with expired options
python -m pipeline.p01_normalize.normalize \
    --broker deribit \
    --input sample_with_expired_options.json \
    --output output.parquet

# Verify hash match rate
python scripts/verify_hash_match_rate.py \
    --broker deribit \
    --user-id 49186 \
    --expected-rate 0.95
```

### Manual Verification
```bash
# 1. Check expired options hash
grep "EXP_A" logs/deribit.log

# 2. Verify hash match rate
# Compare file_row hashes with legacy data
# Target: >= 95%

# 3. Check validation warnings
grep "Skipping trade" logs/deribit.log

# 4. Verify rejection rate
# Calculate: rejected_trades / total_trades
# Target: < 0.1%
```

---

**Última Actualización:** 2026-01-14
**Versión:** 1.0
**Responsable:** Development Team
