Confluent supported the infrastructure of the company that showcased the Cricket T20 World Cup and the IPL, with 70 mn + concurrent users. In a candid chat Greg Taylor, SVP, APAC Sales, Confluent discusses with Rajneesh De, Group Editor, CXO Media & APAC Media how Confluent is leveraging the strong distribution network and the existing customer relationships since becoming an IBM company following the recent acquisition.
How does Confluent solution address the challenge of data fragmentation and data homogeneity in enterprises so that AI can be leveraged more effectively for more accurate insights?
We have what we call the data streaming platform, which is underpinned by a number of technologies, but in the context of data fragmentation, one of the technologies is an open-source project called Kafka. This is underneath the data streaming platform while the other technology is called Flink.
And then on top of that, we have built a ton of capability. We just released something recently called Confluent Intelligence. So, the first piece, data fragmentation occurs because the data is sitting in silos across many different systems, many formats, structured, unstructured, semi-structured.
The traditional way was to have a data warehouse, put everything in a data warehouse, have a data lake, put everything in a data lake. As things have modernized over time, they then brought the two of those together. Now, the fundamental issue with data fragmentation still exists, that being able to deal with data that is in motion or real time does not get captured by those two capabilities efficiently.
However this is very expensive. And so, these two projects, one being Kafka, what’s super interesting about it is you can create a topic, and that topic can have context around all the different data, and many different data sources can fill that one topic. And if you know that topic needs to be consumed by maybe some from this agent, some from this capability, some from this, the point is that they can take what they want.
When you are talking about this coming from multiple data sources, some of these sources can be even unstructured and typically that happens in a large enterprise. So, some would be structured, some would be unstructured, so how would you take care of that?
The beauty of it is we subscribe to the fact that we care about data that is in motion.
If you have data that’s batch data, that is not where you’d want to be using at Confluent. You would use one of our adjacent partners, whether that be a Snowflake, Databricks, Redshift, whatever else. The constraining factor with a batch set of data processes is that time and speed is limited.
Also, it is fairly expensive to be able to store tons and tons and tons of unstructured data and process that. And so, our point of view is that in a fragmented data environment, it depends on the applications that you have, but in general, if it is an analytics application, you are going to leverage both the data streaming platform and your data platform that’s probably more modern, because they don’t have that real-time capability.
However, what we are seeing is a pattern where, because we have something called shift-left, and what shift-left means is we have a strong point of view that as you get closer to where the data is produced, the data in motion, the faster you can make decisions.
And also, you can make decision points now on what data actually gets put into the analytics engine, because the purveying thought is you take everything, put it into the data lake or into the lake house, you do the analytics there, and then you make the decision. One, it’s very expensive, usually 60% of the data doesn’t need to go into the data lake for decision-making. And then two, time.
It’s usually about 40% less efficient time-wise, because you’re using batch data. And so, the data streaming platform allows you to do a couple of things. It allows you, one, be very efficient with the data that you actually put in and analyze, and then two, the things that you need in real-time, whether it’s fraud in analytics, whether it’s quick commerce you want to put an offer in, all of that can be done at the point in time that the user is actually, the user or even an agent, is touching your data set, which is incredibly important.
And so, when you say data fragmentation, what you are trying to do is you still want to create, you still want to make sure that you have all the important data in the data platform. The question now is, because it’s proliferating so quickly, if there’s a way to be able to do it effectively and efficiently, you probably want to do that.
Another challenge which normally still most enterprises face, is multiple vendors being involved in either some platform or some applications. So, in terms of integration with all the multiple vendor platforms, how do you perform?
That is one of the powerful things of Confluent, because most folks, when they want to actually create a data pipeline, that data pipeline has many, many, many, many touch points. And the constraints in the past were that in order to move something from Oracle into your data pipeline, for example, you have to create it, you have to build your own custom capability. It’s very brittle, fragile, you need the person who wrote the code.
We have over 200 connectors, and those connectors are, you know, go into things like MQ, they go into things like Z series for mainframe, they go into Oracle. To the extent that our Oracle Extreme Connector is one of the most popular and most widely used connectors in our portfolio. And the reason being is because you can pull out the information, you can expose it, and you can only pick up the stuff that you need, but it’s all there and all available, and you don’t have to go and program into the Oracle database.
Same thing with AWS, like across the whole ecosystem. We have over 200 connectors, or you can build some of your own if you want, but that kind of defeats the purpose of a managed connector, because we’re keeping it up to date, we’re making sure that it has the latest capability, it’s updated, it’s at the latest release.
Are these connectors sort of templatized depending on the sector or applications where they are being used?
It is about carrying what the endpoints are, and they can be any endpoint. So we see that as a really important strategy, because when you work with Confluent, oftentimes we’re working on use cases. They are usually business problems the customers trying to solve, and the business problem is the data needs to go from one point to another, and that data needs to have integrity across the stream.
So with Kafka, what you can do is you can have topics, and those topics are whether it’s name and address. Even though you might have 25 over here, this topic will only pick up two of those, but there’ll be 10 in the attribute. So it’s whatever your data pipeline wants to pick up. It’s quite a bit more efficient and effective at scale.
Beyond this efficiency of scale through the connectors, what would be Kafka’s unique differentiators against other competitors?
They are actually complementary. We have a product that we have developed with Databricks and Snowflake called Tableflow.
One of the biggest problems when you pull things from the source, you then have to transform those or put context and then drop them into the data lake or the lake house. And usually there’s a couple formats you’ll use, iceberg or delta, and those are kind of the two most popular. So what Tableflow does is it supports both of those formats.
And so you can actually take the real-time data, the context is already there, and it drops it into the format into the lake house. Snowflake for iceberg or Databricks for Delta, and it drops it directly in there. So what it allows you to do is more efficiently populate the warehouse and then more efficiently pull it back out and forth.
This looks fine when you look at it from the technology perspective. But when you look at it from the business perspective, in terms of sort of when everyone is pitching for the same customer, w does it make business sense?
The only reason that you would work with a confluent in Tableflow is you want to have things backed by Kafka, which it’s well-known that that capability is the fastest for processing and transmitting data. The combination of Kafka and Flink.
You can do that with the other vendors we talked about, but then someone has to do the processing work, and then someone also has to do the work that when it sits in the database and categorizing it.
So instead of thinking about typically in a lake house, what you would do is you would have bronze, silver, gold tables, right? What if you could actually take just the data that you want to drop into gold? It means all of a sudden you have less processing, you also have less human intervention or manual intervention of data, and then it’s much more manageable.
So using Tableflow, what you can do is you can actually take the data, put it into either Iceberg or Delta, give all the attributes and drop it directly in. So you save tremendous amounts of time, which is people cost, and then you also save a tremendous amount of processing on the other side.
This literally translates into an efficiency and cost advantage.
How has been the response amongst the enterprise customers for Confluent Intelligence?
We are starting to see the uptake, though the uptake in India has been a little slower.
We are going to announce some things over the next couple of months in some pretty big companies that you would know that are going to be talking about using the Confluent Intelligence capability for transformation, for some real-time decision-making, things like that, but they’re not quite public yet.
So the uptake has been probably a little bit slow. We are now connecting AI agents. We announced the capability, and now we are starting to announce the features. The whole idea is the agents need a different context and different architecture.
We are seeing the enterprises move a little bit slower, but what they are looking for is different deployment models. They want to use cloud on-prem or hybrid, all three modes. Whereas the other two vendors we talked about, they are only cloud.
What Is the proportion on-prem still?
It depends on the customer and the market. This market is slightly more cloud because we have a lot more digital native penetration.
The BFSI section is on-prem pretty much because of the data playing and because of the need to – because of the security requirements and the regulations that are there in the banks.
Most of the banks and the commodities exchanges have pretty deep requirements that are regulated. They could do it in the cloud but the government would not let them.
So if you look at some of the balance, I would say it’s more than 50% 50% on cloud and less then 50%on-prem.
Most of the digital natives however are on cloud, right?
I think you will see more of the banks on-prem, and that’s typically because they’re more regulated. But what’s nice about it is you can actually use all three models together.
And what’s nice about that is certain workloads don’t need to be highly regulated. Cloud is more efficient because we keep it up to date. We keep the manageability.
We do all that. When it’s on-prem, you have more control, but you also have more responsibility. You have to keep it up to date.
And we have less instrumentation to go in and look at your environment and see if it’s architected correctly. And so what we’re working on in the next couple of months are the actual first components of Confluent Private Cloud, which has gateway between the two. You can actually start to bridge the gap between the two and start to get visibility out of that.
But Universal Stream Manager is something we are also releasing. That allows you to take a look at data that’s in motion from your on-prem into the cloud, fully governed throughout the whole lifecycle. So you can see it in both directions.
Digital natives are more tech-savvy, but typically the banks and the other BFSI organizations or, manufacturing are more legacy-oriented. The data team, the analytics team, might not have the relevant skill sets and the manageability might become a challenge. So how much of hand-holding from your side to help the enterprise data team?
Typically the lifecycle you see of a customer is they experience open-source Kafka. We call it OSK. And they realize it is highly capable.
You are able to move data from point A to point B using a pipeline mechanism that’s really efficient, very fast, and pretty low cost. The challenge that you have is the manageability on each side is the responsibility of the customer. And so typically the pattern that we see is whenever anything fails or falters, that’s when the enterprise goes and comes to Confluent and says, give us the managed version.
And then we have deep management capabilities around support and around helping them with implementation to be able to put it in correctly, on the right environment, help them with their pipeline, and help them go live.
Now, it’s harder, as you can imagine, to keep that up-to-date, modernized, and everything. We have to do constant health checks for the customers and things like that.
Confluent Platform, which is the on-premise version, is used at some of the highest scale on the planet. But so is the cloud version. So to give you an example, the company that supported the Cricket World Cup and the Cricket IPL used this platrform.
And that’s when you start to see 70-plus million concurrent users who all are using Confluent Cloud. And it’s taken it to a scale that we’ve never seen in any customer on the planet. So, in theory, you can do it in both platforms. The question is then a matter of what sort of skills you have.
Now, interestingly, in India, there’s a really high capability, technical capability, for using our products. And so what we do find is folks want the manageability. They want the upgrade support, all the new functions.
But they do have pretty good technical skills. There is a really deep level of skill around Flink and around Kafka in this market, more so than most markets around the world.
You have been in Confluent for a while now. In terms of the GTM, how is the structureand what are the key initiatives that you would like to highlight?
We think about APAC not as a one market but we think of it as a collection of geographies.
They are all individual markets. India is, if not the most important, one of the most important markets in APAC. And the reason being is, one, big population, disproportionately to what most markets look like. Partially because of the uptake from the digital natives and the banks and the telcos and what not here.
So all of the industries are adopting Confluent in this market. However, if you ask about the go-to-market, what we’re excited about is most customers start with Kafka. But the digital streaming platform, the data streaming platform, pardon me, is where we see companies going.
And in India, we’re starting to see that at scale. Usually you see it with the digital natives, but the banks are taking it at scale here faster than any banks in the world, other than some of the majors. And even then, we’re supporting them out of the GCCs, unlike Bangalore and places like that.
So if you look at the go-to-market, we have a big investment we are continuing to put into India. So we have just launched something called the GCC, Global Competency Initiative. And we’re actually focusing on companies in-market and working closely with those folks in-market.
And we are also looking at trying to influence back into their HQ facilities as well.
So that GCC part is moving the support into global customers?
They are actually making decisions for the global customers in Bangalore, in Pune, in Hyderabad.
It is no longer a situation where the decision points are made overseas and the execution is here. We’re seeing both happening here for specific projects, particularly around AI competency.
And most of them are considering the Confluence platform for how they think about their AI and digital infrastructure.
When you are looking at India as a market, what is the go-to-market structure? How is it structured?
We put people in geography, but then we think about industry. Now we treat digital native business as a separate industry because it is so prevalent in this market. And then we have different industry flavors. We obviously have government. We do a lot with the BFSI. We do a lot with the telcos. And then manufacturing.
And so if you look across those, and obviously we have kind of some focus in north, focus in west, and then obviously in Bangalore. So that’s how we think about it. So we have geography and then industry across those geographies.
And then we have some specialization around going deep into the data streaming part.
Over the next few quarters if we look at what are going to be the key focuses, both in terms of the geography as well as the sectoral expansions that you talked about, and in terms of new features and where do you see the changing conversations with the digital team and the digital offices?
Digital natives are right out in the forefront of doing a lot of the Confluent Intelligence very closely followed by the BFSI. Now, BFSI has a different flavor to it because it’s obviously a bit more complex, a lot of legacy, a lot of relation.
Our go-to-market is going to continue to evolve. We think that our GCC capability is going to grow dramatically in this market, and we hve made a big bet in that space. In the industries, we also think the government and public sector is going to become an incredibly important capability for us.
Recently, we were acquired by IBM, and we are an IBM company. So as part of that, what we are working deeply on right now is being able to figure out how we map into some of these government organizations and government sectors. Because they talked about things like UPI.
That’s all underpinned by Confluent. Now, not all of them are publicly referenceable, but understand that they run hundreds of millions of transactions and hundreds of millions of people, and Confluent is a big part of that ecosystem. So we’re pretty excited to be part of that.
We are investing deeply in India and continuing to, both technical and sales, and working closely from there.
As you mentioned, the IBM financial advantage, so how do you plan to leverage that? Because, as you again mentioned, IBM already had a large existing relation with many of the large customers. So how do you leverage that advantage?
Distribution. The easiest way is that they have relationships at the C level, CIO. They are being asked on how to expand their digital platform.
They are already in the conversation in the room. For us to get to the conversation in the room, we have to work pretty closely to do that.

