Cointime

Download App
iOS & Android

A Feature Engineering Case Study in Consistency and Fraud Detection

Validated Venture

Main Takeaways

  • As the world’s largest crypto exchange, it’s crucial we have a risk detection system that is fast yet doesn’t compromise on accuracy. 
  • The challenge we encountered was ensuring our models always used up-to-date information, especially when detecting suspicious account activity in real-time. 
  • To achieve stronger feature consistency and greater production speed, we now make reasonable assumptions about our data and combine our batch and streaming pipelines. 

Discover how our feature engineering pipeline creates strong, consistent features to detect fraudulent withdrawals on the Binance platform. 

Inside our machine learning (ML) pipeline — which you can learn more about in a previous article — we recently built an automated feature engineering pipeline that funnels raw data into reusable online features that can be shared across all risk-related models. 

In the process of building and testing this pipeline, our data scientists encountered an intriguing feature consistency problem: How do we create accurate sets of online features that dynamically change over time?

Consider this real-world scenario: A crypto exchange — in this case, Binance — is trying to detect fraudulent withdrawals before money leaves the platform. One possible solution is to add a feature to your model that detects time lapsed since the user’s last specific operation (e.g., log in or bind mobile). It would look something like this:

user_id|last_bind_google_time_diff_in_days|...

1|3.52|...

The Challenge of Implementation

The number of keys required to calculate and update features in an online feature store is impractical. Using a streaming pipeline, such as Flink, would be impossible since it can only calculate users with records coming into Kafka at the present moment. 

As a compromise, we could use a batch pipeline and accept some delay. Let’s say a model can fetch features from an online feature store and perform real-time inference in around one hour. At the same time, if it takes one hour for a feature store to finish calculating and ingesting data, the batch pipeline would — in theory — solve the problem.

Unfortunately, there’s one glaring issue: using such a batch pipeline is highly time-consuming. This makes finishing within one hour unfeasible when you’re the world’s largest crypto exchange dealing with approximately a hundred million users and a TPS limit for writes.  

We’ve found that the best practice is to make assumptions about our users, thereby shrinking the amount of data going into our feature store. 

Easing the Issue With Practical Assumptions

Online features are ingested in real-time and are constantly changing because they represent the most up-to-date version of an environment. With active Binance users, we cannot afford to use models with outdated features.

It’s imperative that our system flags any suspicious withdrawals as soon as possible. Any added delay, even by a few minutes, means more time for a malicious actor to get away with their crimes. 

So, for the sake of efficiency, we assume recent logins hold relatively higher risk:

  • We find (250 days + 0.125[3/24 delay] day) produces relatively smaller errors than (1 day +  0.125[3/24 delay] day).
  • Most operations won’t exceed a certain threshold; let’s say 365 days. To save time and computing resources, we omit users who haven’t logged in for over a year. 

Our Solution

We use lambda architecture, which entails a process where we combine batch and streaming pipelines, to achieve stronger feature consistency.

What does the solution look like conceptually?

  • Batch Pipeline: Performs feature engineering for a massive user base.
  • Streaming Pipeline: Remedies batch pipeline delay time for recent logins.

What if a record is ingested into the online feature store between the delay time in batch ingestion?

Our features still maintain strong consistency even when records are ingested during the one-hour batch ingestion delay period. This is because the online feature store we use at Binance returns the latest value based on the event_time you specify when retrieving the value.

Comments

All Comments

Recommended for you

  • ETH breaks through $2100

    market shows ETH breaking through $2100, currently at $2100.24, with a 24-hour increase of 7.65%. The market is highly volatile, please manage your risks accordingly.

  • BTC falls below $66,000

    the market shows BTC falling below 66,000 USD, currently at 65,996.42 USD, a 24-hour decline of 2.35%, with significant market fluctuations, please manage your risk properly.

  • YesGo Makes Its Public Debut: Joining Forces with Ecosystem and Industry Leaders to Usher in a New Era of On-Chain Native Commerce

    Hong Kong, February 11, 2026 – As one of the most visionary cross-sector dialogues held during Hong Kong Consensus Week, the YesGo Ecosystem Partner Meeting concluded successfully yesterday. This closed-door event, spearheaded by YesGo and co-hosted by Nexus Chain and compliant digital asset exchange CoinMy, brought together a select group of global ecosystem partners, industry KOLs, and media representatives.

  • The number of Americans filing for unemployment benefits last week was 227,000.

     initial jobless claims in the United States last week were 227,000, estimated at 224,000, previous value was 231,000.

  • BTC breaks through $68,000

     the market shows BTC breaking through $68,000, currently at $68,023.93, with a 24-hour decline of 1.36%. The market is highly volatile, please manage your risk accordingly.

  • [Consensus HK] ENI CEO Arion Ho: Decentralization is an Engineering Choice, Not a Slogan

    At the Consensus Hong Kong 2026 summit, ENI Founder and CEO Arion Ho joined the DeFi Lead at CoinDesk and executives from Paradigm and Blockdaemon to debate the future of DeFi decentralization. Ho delivered a sharp critique of the industry’s current trajectory, asserting that decentralization should never be about "slogan-style freedom," but is fundamentally a rigorous engineering choice.

  • Trump praised the non-farm payroll data and urged the Federal Reserve to cut interest rates to the "lowest in the world."

    US President Trump posted on social media, "Employment data is excellent, far exceeding expectations! The US should pay much less interest on borrowing costs (bonds!). We have once again become the world's number one power, and therefore deserve the lowest interest rates ever. This will bring at least one trillion dollars in interest savings annually — the budget will not only be balanced but will have a substantial surplus. Wow! The golden age of America has arrived!!!"

  • BTC falls below $67,000

    the market shows BTC falling below $67,000, currently at $66,991.58, with a 24-hour decline of 3.41%. The market is highly volatile, please manage your risk accordingly.

  • BTC falls below $69,000

     the market shows BTC fell below 69,000 USD, currently at 68,996.18 USD, with a 24-hour decline of 2.21%. The market is highly volatile, please manage your risk accordingly.

  • BTC falls below $70,000

     the market shows BTC falling below $70,000, currently at $69,990, with a 24-hour decline of 1.04%. The market is highly volatile, please manage your risk accordingly.