Cointime

Download App
iOS & Android

A Feature Engineering Case Study in Consistency and Fraud Detection

Validated Venture

Main Takeaways

  • As the world’s largest crypto exchange, it’s crucial we have a risk detection system that is fast yet doesn’t compromise on accuracy. 
  • The challenge we encountered was ensuring our models always used up-to-date information, especially when detecting suspicious account activity in real-time. 
  • To achieve stronger feature consistency and greater production speed, we now make reasonable assumptions about our data and combine our batch and streaming pipelines. 

Discover how our feature engineering pipeline creates strong, consistent features to detect fraudulent withdrawals on the Binance platform. 

Inside our machine learning (ML) pipeline — which you can learn more about in a previous article — we recently built an automated feature engineering pipeline that funnels raw data into reusable online features that can be shared across all risk-related models. 

In the process of building and testing this pipeline, our data scientists encountered an intriguing feature consistency problem: How do we create accurate sets of online features that dynamically change over time?

Consider this real-world scenario: A crypto exchange — in this case, Binance — is trying to detect fraudulent withdrawals before money leaves the platform. One possible solution is to add a feature to your model that detects time lapsed since the user’s last specific operation (e.g., log in or bind mobile). It would look something like this:

user_id|last_bind_google_time_diff_in_days|...

1|3.52|...

The Challenge of Implementation

The number of keys required to calculate and update features in an online feature store is impractical. Using a streaming pipeline, such as Flink, would be impossible since it can only calculate users with records coming into Kafka at the present moment. 

As a compromise, we could use a batch pipeline and accept some delay. Let’s say a model can fetch features from an online feature store and perform real-time inference in around one hour. At the same time, if it takes one hour for a feature store to finish calculating and ingesting data, the batch pipeline would — in theory — solve the problem.

Unfortunately, there’s one glaring issue: using such a batch pipeline is highly time-consuming. This makes finishing within one hour unfeasible when you’re the world’s largest crypto exchange dealing with approximately a hundred million users and a TPS limit for writes.  

We’ve found that the best practice is to make assumptions about our users, thereby shrinking the amount of data going into our feature store. 

Easing the Issue With Practical Assumptions

Online features are ingested in real-time and are constantly changing because they represent the most up-to-date version of an environment. With active Binance users, we cannot afford to use models with outdated features.

It’s imperative that our system flags any suspicious withdrawals as soon as possible. Any added delay, even by a few minutes, means more time for a malicious actor to get away with their crimes. 

So, for the sake of efficiency, we assume recent logins hold relatively higher risk:

  • We find (250 days + 0.125[3/24 delay] day) produces relatively smaller errors than (1 day +  0.125[3/24 delay] day).
  • Most operations won’t exceed a certain threshold; let’s say 365 days. To save time and computing resources, we omit users who haven’t logged in for over a year. 

Our Solution

We use lambda architecture, which entails a process where we combine batch and streaming pipelines, to achieve stronger feature consistency.

What does the solution look like conceptually?

  • Batch Pipeline: Performs feature engineering for a massive user base.
  • Streaming Pipeline: Remedies batch pipeline delay time for recent logins.

What if a record is ingested into the online feature store between the delay time in batch ingestion?

Our features still maintain strong consistency even when records are ingested during the one-hour batch ingestion delay period. This is because the online feature store we use at Binance returns the latest value based on the event_time you specify when retrieving the value.

Comments

All Comments

Recommended for you

  • American Bitcoin's Bitcoin reserves have increased by approximately 623 BTC in the past 7 days, bringing its current holdings to 4941 BTC.

    Emmett Gallic, a blockchain analyst who previously disclosed and analyzed the "1011 insider whale," posted on the X platform revealing updated data on the Bitcoin reserves of American Bitcoin, a crypto mining company supported by the Trump family. In the past seven days, they increased their holdings by about 623 BTC, of which approximately 80 BTC came from mining income and 542 BTC from strategic acquisitions in the open market. Currently, their total Bitcoin holdings have risen to 4,941 BTC, with a current market value of about 450 million USD.

  • The US spot Ethereum ETF saw a net outflow of $19.4 million yesterday.

    according to TraderT monitoring, the US spot Ethereum ETF had a net outflow of 19.4 million USD yesterday.

  • Listed companies, governments, ETFs, and exchanges collectively hold 5.94 million Bitcoins, representing 29.8% of the circulating supply.

    Glassnode analyzed the holdings of major types of Bitcoin holders as follows: Listed companies: about 1.07 million bitcoins, government agencies: about 620,000 bitcoins, US spot ETFs: about 1.31 million bitcoins, exchanges: about 2.94 million bitcoins. These institutions collectively hold about 5.94 million bitcoins, accounting for approximately 29.8% of the circulating supply, highlighting the trend of liquidity increasingly concentrating in institutions and custodians.

  • The Bank of Japan is reportedly planning further interest rate hikes; some officials believe the neutral interest rate will be higher than 1%.

    according to insiders, Bank of Japan officials believe that before the current rate hike cycle ends, interest rates are likely to rise above 0.75%, indicating that there may be more rate hikes after next week's increase. These insiders said that officials believe that even if rates rise to 0.75%, the Bank of Japan has not yet reached the neutral interest rate level. Some officials already consider 1% to still be below the neutral interest rate level. Insiders stated that even if the Bank of Japan updates its neutral rate estimates based on the latest data, it currently does not believe that this range will significantly narrow. Currently, the Bank of Japan's estimate for the nominal neutral interest rate range is about 1% to 2.5%. Insiders said that Bank of Japan officials also believe there may be errors in the upper and lower limits of this range itself. (Golden Ten)

  • OKX: Platform users can earn up to 4.10% annualized return by holding USDG.

    According to the official announcement, from 00:00 on December 11, 2025 to 00:00 on January 11, 2026 (UTC+8), users holding USDG in their OKX funding, trading, and lending accounts can automatically earn an annualized yield of up to 4.10% provided by the OKX platform, with the ability to withdraw or use it at any time, allowing both trading and wealth management simultaneously. Users can check their earnings anytime through the OKX APP (version 6.136.10 and above) - Assets - by clicking on USDG. Moving forward, the platform will continue to expand the application of USDG in more trading and wealth management scenarios.

  • The Federal Reserve will begin its Reserve Management Purchase (RMP) program today, purchasing $40 billion in Treasury bonds per month.

     according to the Federal Reserve Open Market Committee's decision on December 10, the Federal Reserve will start implementing the Reserve Management Purchase (RMP) program from December 12, purchasing a total of $40 billion in short-term Treasury securities in the secondary market.

  • Bitcoin treasury company Strategy's daily transaction volume has now surpassed that of payment giant Visa.

    according to market sources: the daily trading volume of Bitcoin treasury company Strategy (MSTR) has now surpassed the payment giant Visa.

  • The US spot Bitcoin ETF saw a net outflow of $78.35 million yesterday.

    according to Trader T's monitoring, the US spot Bitcoin ETF had a net outflow of $78.35 million yesterday.

  • JPMorgan Chase issues Galaxy short-term bonds on Solana network

     JPMorgan arranged and created, distributed, and settled a short-term bond on the Solana blockchain for Galaxy Digital Holdings LP, as part of efforts to enhance financial market efficiency using underlying cryptocurrency technology.

  • HSBC expects the Federal Reserve to refrain from cutting interest rates for the next two years.

    HSBC Securities predicts the Federal Reserve will maintain interest rates stable at the 3.5%-3.75% range set on Wednesday for the next two years. Previously, Federal Reserve policymakers lowered rates by 25 basis points with a split vote. The institution's U.S. economist Ryan Wang pointed out in a report on December 10 that Federal Reserve Chairman Jerome Powell was "open to the question of whether and when to further cut rates at next year's FOMC press conference." "We believe the FOMC will keep the federal funds rate target range unchanged at 3.50%-3.75% throughout 2026 and 2027, but as the economy evolves, as in the past, it is always necessary to pay close attention to the significant two-way risks facing this outlook."