MY HUMBLE BEGINNINGS

My name is Guilherme Bencke and since I was a little kid (about 8 years old) I was fascinated about computing and programming. So, around 1986, I got my first computer which was an MSX Expert (roughly the same Specs of a Commodore 64 which is around the same generation), with its impressive 32Kb of RAM and 16 bright colors! A K7 tape recorder for storage and some magazines with code listings to try. Although with a very limiting hardware, it was already possible to see what personal computing could do.
After the 8-bit era, came the 16-bit IBM PC XT with its powerful 8088 microprocessors and then we got started to do some real work, I learned Turbo C 2.0, Clipper, Pascal, and started doing some very small applications for my grandfather (which owned a very small export/import sales representation office) and for some of my father's friends. In a time when dBase III databases ruled the business world.
So, in 1998 I entered the university as undergraduate pursuing a degree on Computer Sciences, which I completed in 2003. Many of my colleagues didn't find very important to study so hard for a degree on Computer Sciences, but almost 20 years ago, the low-level algorithms, the computer theory formalisms, and so many subjects that are the basis of all current technologies, but so rarely studied nowadays, have proven invaluable. I sometimes remember my coworkers that although there is a new hot javascript framework every week, the algorithms, patterns and datastructures that such framework uses are normally from the 1990's at the latest.

Burnout

Since my senior years at the university, I have worked as DevLead for some major companies like HP, ACTIA, BRASKEM, DELL and others. At that time I was very perfectionist and was responsible for critical and highly-visible projects. Unfortunately, my stress management wasn't very efficient at that time, and around 2010, I had a very serious burnout experience and left IT. I started a small business with a brother-in-law to sell custom made furniture. The business was very successful, but I was doing something which was very far from what was my passion and it was very stressful as well.

Return to IT

Those 2-3 years when I worked outside of the IT field, were very import to show me that the problem was not IT, but the way that I saw and handled the everyday situations. Around 2012, I decided to come back to IT and continue my career as DevLeader, but at the same time my interest in Zen Buddhism grew a lot and I started using its techniques in order to handle everyday situations. I have a section on this website where I describe the intersection between Buddhism and software development and how such ancient philosophy can make us better developers and better humans.

TOOLS OF THE TRADE

Working since the mid-1990's with software development, a person can try and experience various eras and ups and downs of the software industry.
From a time where a Oracle 7.3 100Mhz RISC server was cutting-edge to our modern NoSQL Databases in the cloud spanning thousands of nodes, we have come a long way. Here, I would like to just share some of my views on the tools that I am currently using for my projects.

Philosophy

(Mind Tool)
Normally, when we work with software and think of tools, we think about: Stacks, Databases, Workflow Tools and so on.
But the only tool that we use when we develop software is the mind.
All that we code is created on the mind and then materialized as text files that a computer can understand and execute.

So, the questions: Why do I code, What is the nature of this code, and what is important to me, are extremelly important on the long run. Most of my colleagues from University have already left the software business, because for them software is only a tool to reach goals and not an art and form of expression.

Why do I code?

This is by far the most important question that you need to ask, and normally a question that comes to the mind of the programmer on his 30s. That is why I think that most famous software companies avoid to hire programmers younger than 30, as there is a view that they can better adapt their philosophy to the young before the early 30s crisis.
Some well-known examples of answers are:
  • I code, because, it pays well
  • I code, because, software companies are prestigious
  • I code, because, it makes me look cool with my MacBook Pro
  • I code, because, I want to have my own business
When I was on my 30s, my answer would have a little bit of each one above. But this actually doesn't help when you need to decide if you keep your job or look for another? Should I start managing and stop coding? And what if the tools and technologies that I think so cool, becomes not so cool?

So, on the end, you need to focus and this is the most important thing that you can do with the mind tool, which is to focus on what is most valuable to you. Today, I code, because I seek Technical Excellence, to solve harder and harder problems with the best engineering as possible (engineering meaning: how easy it is to maintain and support the final product).

This is a little hard to explain, because most of the people today focus in whether they make a lot of money or have a lot of recognition from peers, or even both. A movie that has inspired me on the last years, and show someone that is commited to Technical Excellence is:Jiro Dreams of Sushi, which documents the story of someone who has fallen in love with its craft.

Stacks

(Which Ground Foundation?)
Today, my Favourite Language is Python, which is a language that has gained tremendous momentum on the last couple years. What most people doesn't realize is why it is becaming so popular? Normally people analyse the characteristics of the language on itself, and the language has its merits, as it is simple to code, not very verbose, easy to use and so on. But many other interpreted languages like Lua or Ruby have the same characteristics, and are not so popular.

The Killer Feature for Python is actually how easy it is to manipulate low-level C/C++ libraries and APIs using Python. All the famous python libraries like Numpy, SciKitLearn, TensorFlow, OpenCV and many others are NOT written in Python, they are written in C++. Python is only a wrapper that allow libraries to exchange data between themselves using a very clear interface and a programming language that is made to make data transformation easy.

The main problem with C++ from the 2010's onwards is the lack of a package manager like pip for python, maven for java, gems for ruby, composer for php and so on. This is due to several reasons, mainly because C++ is platform dependent so you need to specify the microprocessor architecture that in which you are creating your artefact ( x86/x64, ARM, SPARC, Intel ). And if you need to have a executable that uses libraries generated by different compiler versions, it was a very difficult task to make it work.

So, in a nutshell, if I want to use OpenCV, Tessaract and TensorFlow on the same executable, on the past to link all of those C++ libraries, it was a lot of hard work, today you just type:
C:\>pip install cv2, tessaract, tensorflow
to install the libraries on your computer, and then, open the python REPL:
>>> import cv2
>>> import tessaract
>>> import tensorflow
>>> # and here comes your code....
And you can start exchanging data between those libraries, without having to worry about editing makefiles, compiler and linker versions and platform-specific binaries, the perfect "glue" between those C++ libraries.

Python is not a typed language, so I strongly recommend against using it on applications where the domain model logic is very complex like financial applications, shipping and other systems where you will have thousands of classes just to abstract the business problem that you want to solve.

It is a language that was written as a scripting language, so it is better to be used as a language to be used when you need to execute a task, and then exit the process, not for long running processes and daemons

(I'm normally bashed and flamed because of this opinion, but that is my opinion (Sorry Django Boys!))

Most of my professional experience is in coding C#, which is a great language, easier to use than Java, and also having a lot of nice functionalities like LINQ, Asynchronous programming and many others that Java is just starting to catch.

I currently work with a application with more than 15k classes and 1.2 millions of code, and it is very stable and efficiente, for long running processes, NGEN provides a AOT compilation that makes startup and execution times as good as C++. Not to mention an IDE made for specific use for that platform.

The only problem with C# is that all the above is only true if you use windows (as .NET Core, Mono and others in my opinion are not production-ready yet).

There was time when Hardware was the king and Software was only an accessory that came with the hardware. Then came C and UNIX on the late 70s that was the universal assembly, so you could write a program and run it on every single hardware (mainframe, of course). With time, the operating systems became more complex and C was not more a viable solution, then came Java on the late 90s with the promise of write once, run everyone, although at that time, when I wrote a program with JDK 1.1, it was extremelly slow, and I asked myself: "Where can someone run it?"

Java has come a long way since then and the JVM today is one of the most engineered pieces of software on the planet, if you target the JVM using Java or any of the dozens of JVM-compatible languages, you are guaranteed to run on any platform, which by itself is a amazing feat of engineering.

Java has lost a lot of space on Web Applications, as simpler frameworks and languages took its share from the market, with cloud computing, it really doesn't matter if you are running the application on a Resource efficient JVM, or on a interpreted environment like Java or Ruby, CPU cycles have become very cheap.

Although Java Web has lost some market share on the past years, Java is more and more being used for low-level server applications like messaging systems, NoSQL Databases, Data Mining, Text Mining and so on, a niche that was traditionally occupied by C/C++.

On the late 1990s, C++ programming and "Professional Software" were synonymous, you were not expected to do anything professional that were not made in this language, mainly because at that time CPU Cycles were expensive, so every cycle should count. Today, CPU cycles are cheap, but programmer time is expensive, downtime is expensive, so simpler languages are preferred over coding large systems on C++, specially if their binaries are platform independent like Java.

Until 2006, I coded almost exclusively in C++, today it is becaming more and more rare to code in this language. One of the main problems with C++ is that it is hard to combine multiple libraries on the same executable, mainly because the binaries produced are normally platform dependent. Python today is gaining a lot of momentum as a language to "glue" libraries and software made on different languages, so it is even rarer to code directly in C++. The great irony is that although Python has the fame as the King of Data Science Languages, the truth is that C++ is the real language used in that domain, although it is hidden inside Python classes.

Databases

(What is, and Where is my Data)
On the beginning, there was the text file, the universal interface. but the naive humans needed to structure their data, so came the database files like the ones used by dBase III and its infamous locking problems, specially when used over networks, and then on the 1990's the relational database and the client-server architecture, which is still in one way or another the main architecture being used (Even if you are using a web, desktop or mobile app).

But the relational database has its own problems, as it is bounded by the ACID principle (Atomicity, Consistency, Independency and Durability) of Transactions. Which in my opinion, it is essencial for any business process, specially financial processes. For some decades the Relational model ruled as the "King of the Hill" on the vast majority of business applications.... But....
There are sometimes, some data that is impossible to know its structures prior to storing it in the database, one example is newspaper advertisements, where the properties of the data may vary with time and where the user wishes to have the freedom to highlight and model what are the characteristics of the data being advertised and not to conform to a preset database schema (DocumentDB)
Or in the case of highly real-time databases where you need only a queue or a list of values and you need to guarantee that although you serve many connections, you operate internally in a single-thread manner, in this case, every transaction is serialized, but you still need to have a very well pre-defined response time, which relational databases can't guarantee as there is the need for data consistency and its implementation through row locking (Key-Value Store)
Or when you have a critical table on your database which spans 750 Tb (Terabytes) of size like Facebook posts, and it needs to have 99.999999% of availability time? First of all, you need to have a distributed database, that spans over hundreds of thousands of nodes, and such nodes contains backup copies of every other node. (ColumnStore DBs)
For these use cases, the relational model falls short, so we need the NoSQL Databases (Key-Value, Column Store, Document Databases and so on). I have been migrating many tables of the legacy applications that I support into NoSQL databases specially Redis (Key-Value) and MongoDB (Document), with amazing results.

But the truth is that still 90% of the data is stored in the relational database (Postgres) and 10% of this data that has special requirements goes into the NoSQL Database.

Workflow

(Keep track of the work)
The Main responsabilities of the Development Leader Role in my understanding are:
  • Be Responsible for the Technical Solution of the Software
  • Be Responsible for the Integrity of the Software Artifacts
    (Source Code, Builds, Binaries, Documentation)
  • Ensure a productive and fluent workflow of the team
Sometimes, I joke that the DevLead is like a RDBMS, it needs to ensure that everybody gets its piece of the work, in time, in a consistent way, and when it is necessary to change the artifacts of the project, to be done on the most durable and consistent way.
When I started working in software development in the 1990's, it was very rare to have large software development teams. A Large Team was something like 5-10 people, and at that time, there were very few tools that helped us to keep track of code and artifact changes on the project. Good old SourceSafe was the King on the Microsoft Stack and on Linux, well, rsync...
And well into the 2010's, it was still difficult to keep track of change and be to able to work on different features at the same time.

The project that I manage today has over 1.2 million lines of .NET and has been in development since 2003, it was on sourcesafe and then TFS, now it is stored on git.
When we used TFS or sourcesafe, if someone needed to work on a feature, we simply copied the entire source tree, and then when we needed to merge the changes back to the main repository, it was a 2-3 day of work and pray, because you were normally working on your source tree for the last few months
Linus Torvalds explains this situation in this video from 2007 (@ around 22 minutes)

Then, came this man from the video that I have just linked above, and changed everything. Linux was at the early 2000's a project with hundreds of collaborators and million of lines of code, that simply was too big for the kind of tools that we had available at that time. Of course, you could spend a few hundred thousand dollars and use a tool like Rational ClearCase, but this was not something to be used by a open source project where most of the developers might be simply hobbysts. And then, Git changed everything. It changed everything, because:
  • It doesn't rely on a central server, but on the filesystem of your computer, so you can work offline
  • Each file, commit, folder is a object and forms a linked list of commits, and more important everything is checksummed, so no risk of corruption
  • Such linked lists of commits can fork and create a new branch (Development Line Of Work), without having to create a new build from scratch or a new repo. You also don't need to copy everything on a new build tree, it is done in place, with a single command
  • As it is easy to create branches, the branches are smaller, and easier to merge, so you can merge more often and reduce the risk of incorrect merging. Because you are merging a couple hundred lines of changes that you can easily check and not tens of thousands of lines, that is impossible to singly check every change
  • Although all operations are offline, you can easily pull and push changes to remote repos that are identical copies to the local one.
Git is great, it saved thousands of hours of sleep for DevLeaders and every person which role is to keep track of the project's source code and artefacts. But, as we now have a standard way of storing source code, we can now have Issue Tracking and Requirement Tracking tools that integrate on the source code itself.
Tools like GitLab and GitHub have the entire software development workflow into it, and in my opinion for source code tracking and documents, there is no need for any other tool other than those 2.
Today, they are quite equivalent, the main difference is that GitLab has free private repos, while Github doesn't. But Github is still the standard tool, and where most open source software is stored and developed.
GitLab has a community edition that you can download and install on a private server, and I use this version for my professional projects, while my public projects are stored on Github.

PORTFOLIO

Here are some of the projects from the last 10 years or so. It is important to notice that most of my working life I have been working as DevLeader, so I didn't have much time to invest in supporting open-source projects, which is one of the regrets that I have, regarding my professional life, although I have worked on a lot of side-projects and volunteer-work

Zanc - MultCob / MultCall

(Current Project, Since 2015, .NET Stack, PostgreSQL and REDIS Databases)
Context: Zanc is one of the main debt-collection companies in Brazil, and one of the largest. A Debt-Collection company is a service company that receive collection leads from banks and major financial institutions and then try to collect the pending debt from the customer, this can be: credit-cards, vehicles, real-estate, utilities bills, phone bills and so on.

From a Software perspective such company requires a complex and highly customized CRM, in which you can create highly complex worflows. Such workflow normally is: the receival of collection leads from financial institutions, update of customer information in the database, and then load-balacing the collection leads among several collection operators in a call center, and then finally managing the calls on the Asterisk PABX backend (in real-time).

The CRM and Dialer used by Zanc is 100% custom made, as it is still very hard to find a CRM that have debt-collection workflows varying from real-state to credit cards.
It is on development since 2003 using .NET (VB.NET and C#) with WPF and Windows Forms UX and PostgreSQL and SQL Server backends. It is a very large project with over 1 million lines of code and a Visual Studio 2012 Solution File with over 125 projects.

Problems: Multcob/Multcall had been on development since 2003 by a software company here in Porto Alegre, unfortunately there were a lot of contract disagreements between this company and Zanc throughout 2013-2014.

So, around 2014 active development of the system halted, and the company that developed the system just gave minimal support.

There were hundreds of small problems on the project, but the main issues where:
Poor Performance on Loading the Debt Collection Leads:Every Night, hundreds of thousands of leads are loaded from financial institutions FTP Servers. Such files need to be parsed, loaded and then stored on the databases, and also the workflow and load-balacing for the morning needs to be determined. This operation was taking around 8 hours to complete, even though Zanc has its own datacenter and cutting edge servers.
Asterisk PABX Dialer hanged during working hours: As we determine the leads for each collection operator, it is necessary to make sure that each operator is on call, trying to negotiate the debt with the customers. For this, the Dialer Server is always dialing on the backend and then forwarding the calls to the operators. This Server hanged several times during the working hours, causing a lot of halts on production, and millions of dollars of lost income.

Solution:Unfortunately, Zanc had to cancel the contract with the software company that originally developed the system, and then it had a mission-critical software without support. Fortunately, prior to cancelling, Zanc had asked me to take over the project and assemble a team of developers to continue the project, and as I am used to the FireFighter-DevLeader Role, I accepted the challenge.

It is important to notice that there were hundreds of problem in the software itself, and also the project workflow was a mess without a good source control management and change management control. The first thing that we did was to migrate the source control to Git using GitLab and use Jenkins as a Continuous Integration Tool in order to have a standard way to generate production and test binaries.

The two main problems were handled as follows:
Poor Performance on Loading the Debt Collection Leads:The root cause of this issue was that the load of leads was single-threaded. So, it really didn't matter if you have a 16 core server with 64gb of ram, it never used 1% of the load of the server. The solution was to take the hundreds of thousands of leads and then split then in workloads of a few hundreds, and start hundred of worker processes on the server using multiprocessing. The .NET Service then just monitored the load of the server, spawning worker processes, as there was available capacity. The result was that a load that was made on 10 hours was now being made in 15 minutes. And this was of great value to the business, as it can now run several other processes during the night, that was being occupied only by the simple load of leads into the system.
Asterisk PABX Dialer hanged during working hours: A Dialer is a real-time server that constantly monitor the status of the operators and using regression techniques, try to predict the calls necessary to be made, so that when the operator ends a call, a new one is forwarded to him (within 15seconds). This implies on always polling the database at every tens of milliseconds.
The main problem in this case was that it was being used a Relational Database (SQL Server) for both this real-time operation, for the management of leads and also management report.
The solution was both to use worker processes to execute the dial on asterisk and also to use a REDIS database to store the real-time operation, so all the write of data goes both to REDIS and SQL Server, but all the polling was being done on REDIS, which reduced the load on SQL Server by 70%, and the dialer never hanged again.

Dell - Global Warranty Adequacy Calculator

(2008-2011, .NET Stack, Oracle PL/SQL and SQL Server OLAP)
Context: Dell is one the largest technology companies on the planet, and sells a large portion of the corporate workstations and servers used worldwide, with hundreds of billions of dollars being sold every year, it has a very complex IT Structure
Problem:Although its large IT Budget and thousands of systems being used to run the company, one of the largest Liabilities on Dell Balance Sheet is the Warranty Contracts that it has with its customers. Every time a product is sold, a contract for warranty is issued, and such contracts vary a lot based on product, negotiation, geographic location and so on.
The liability generated by such contracts can achieve a amount as much as 4 billion dollars that Dell needs to provision for.
And although being such a heavy item on the Balance Sheet (The largest single item), its calculation is very adhoc for each business unit of the company and every geographical location, which causes a lot of auditing problems and reliability concerns.
Solution:First of all, there was not a software specilized in calculating such data or even to provide analytics. There were some scripts that loaded data from Dell Data Warehouse and then analysts would work for weeks using excel in order to summarize the data. The chosen solution was to still load the raw data from the same source and then create a ETL application that on the end created a OLAP Database that the users could query and analyse. The 3 fact tables were Incidents, CallCenter Calls and the equipment actually sold, and each table had between 500 million to 1 billion of rows, and the process had to run monthly. The first run took around 1 week to complete, but with the use of very aggressive table partitioning and tuning of the 250k lines of PL/SQL scripts used we managed to run the process in 48 hours, exporting the Data as a SQL Server OLAP Cube which could then be queried using Excel Spreadsheets or a .NET WebSite.

ACTIA - MultDi@g

(2003-2006, C/C++)
Context: ACTIA is one of the leading software and hardware development companies on the computer-based vehicle diagnostics business. If you take your car to a service shop and it is from a european-based manufacturer, there is a large chance that the software used to perform diagnostics on the electronic injection, dashboard and other ECUs is made by ACTIA. It is based on Toulouse, France (The same city where the Airbus is assembled).
Problem:ACTIA sells a product called MultiDi@g that allows the diagnostics of ECUs for more than 50 european brands, such brand contains hundreds of control units that need to be diagnosed, but many of those units are the same on most brands or have very similar diagnostics protocols like CAN, ISO14230, ISO9141 and ao on.
The Hardware being sold with the Product is the ACTIA Passthru which supports all the hardware protocols on the market. The Software that manages the Hardware is all written in C/C++ and is also quite standard.
The main problem with the product, actually is the UI, because of several requirements:
Variety of Available UIs: Although Multdi@g is sold to multi-brands service shops, its components should also be used in products focused on only one automotive brand like: Citroen, Peugueot, Fiat and others, each one will have its own "Skin"
Multimidia: The Software needs to be as intuitive as possible allowing videos, sound, animation and rich text so that the service shop technician can easily perform the diagnostics without having to see other reference technical manuals.
Multiple C/C++ Libraries: The product is actually a consolidation of several stand-alone diagnostics tools, so for each brand might be there more than one available low-level C/C++ Library to use. So, the product must be able to know which library to use and those libraries need to play nicely with one another.

Solution:As we can see on the requirements above, the best solution would be to use HTML and CSS, so we could easily create CSS stylesheets and markup for each software skin. At that point (2003) that best browser available was Mozilla, so we decided to use XUL which is a variant of HTML but for desktop applications and then create XPCOM libraries that encapsulated the several available computer diagnostics libraries available, and export those libraries as javascript objects.
So, on the end we needed only to code HTML/CSS/JS to perform vehicle diagnostics as those javascript objects would bridge the communication with the low-level C/C++ libraries, and then the diagnostics tool developer needs only to know those common-used web technologies.

Demos:
Multdi@g in operation in a service shop

Other Projects

PyPokerBot (2017, Python): This is a side project to create a robot that from a screenshot can detect the information on a poker client, and then take poker decisions on its own. It has a Screen Detection Module using OpenCV and a C/C++ Poker Hand Strength analyser (that runs as a separate REST Server). It works fine and it able to run on PokerStars site on its own, but it is missing a better AI in order to beat stronger players.
 
AES Encryption Layer for SQLLite (2016, C/C++/.NET): This is a freelancing project that I did sometime ago where it was necessary to add a AES layer to SQLite databases so that the database file couldn't be read without setting the correct encription keys.
Developed on pure C (As SQLite itself), it is a good example of how to customize such popular database library.
 
KL (2013, PHP / MySQL): Khadro Ling is a major buddhist center on southern Brazil, and also a hub for many events and conferences that happen throughout the year. Its main source of revenue is the hospitality services that it provides to the persons that frequent such events.
Around 2013, I was very interested in buddhism and there was a volunteering opportunity to rewrite their main hospitality system. The previous one was made in Borland Delphi and was showing its age, I rewrote it from scratch using PHP and jQuery UI Components. Is controls the flux of events, the enrollment, check-in and accomodation processes as well.
 
Agent Based Investment Analisys (2003, Java / LISP): This is my final project for the degree of bachelor in Computer Sciences. For this project you need to create some project to solve a computer secience project and show its usage and implementation details. My idea was to develop Intelligent Agents for Financial Markets that would interact with each other. Such area uses heavily Artificial Intelligence and Machine Learning techniques, but little attention is given on how to pipeline the different algorithms and techniques in order to improve its perfomance.
So, I created this project which is a Java Platform that encapsulates intelligent agents that implement: Neural Networks (in Java), Expert Systems (LISP), Genetic Algorithms and so on... It also has a 150 pages long paper on the intersection between Financial Markets and AI, although it has been written in 2003, many of the concepts are still current.

ZEN 禅

(and Software)

Where is the Software?

Someday, sometime ago, I was on a meeting with some managers discussing the requirements for the next development sprint for the software that I manage in a large callcenter company, when our Director asked me?
-Bencke, what is the essence of our software?
I thought for some time and then answered:
-Surely, it is not the software itself, because if the essence of the software is the software running, we wouldn't be able to discuss it without running
-Also, it is not its code, because we can discuss and alter the software even when we are not running it or looking at its code.
-Surely, it is not also the users, because it can run even without users.

So, I concluded that:
-The essence of a Software is Thought, and they are: our expectations, fears, beliefs and all that we think about a business process, and what we automatize is a though process materialized in running code.
-The Essence of a Information Systems is the sum of all the Thoughts, Attachments and Fears of everybody involved on its continuous creation. So, in our case, the call-center manager thinks of the system of a means to make sure that we have the greatest amount of contact with profitable customers as possible, the financial manager wants to make sure that it can track where the time and resources of the company is going, and the programmer brings together its experience and expertise to choose the best algorithms to each task.

So, it is the balance between the contribution of those persons, each one with its owns thoughts that sum ups the essence of the system, so the software is actually Thought and not running codes on a computer host.
And as thoughts move and change, so the system...

Mind ? Ego ? Who Am I?

So, what is our essence? What is the core of ourselves and what is only a accessory?

Zen is focused on studying the boundaries of the mind and to observe itself, and as we observe our internal processes we reach self-knowledge and a better understanding of our own behaviour.

This is mainly done through Meditation Practices (like Zazen). Such Practices allow ourselves to observe the mind from a point outside of it. As I always say, it is necessary to put the backpack down (what we are carrying on our mind), in order to organize it.

So, in my understanding there are 3 layers on what we populary know as the "mind":
Ego:Our Ego is a set of mental constructs that we create or are given to use, such constructs are: Our name, our nationality, the sets of beliefs that we understand as true or false, and also what we think that happiness is or not. Most of our life we work to validate and/or satisfy this set of beliefs.
(The Ego is like the set of userspace programs on a computer, that runs most of the operations on it).

Mind:This is the rational, thinking mind, that we use to process and create the ego structures, this is a tool, but it is not ourselves, it is more like a computer terminal where we can see results and also interact with several concepts and ideas. Today, most persons think that we are our mind, as descartes said: "As I think, As I am...", but this is not true, we are not our mind...
(The Mind is the operating system, where we define the do's and don'ts and also the rules for our thinking process).

Self:This is the lowest level of us, it is a observer, pure consciousness, it observers the mind, it feels, it decides and so on. It is full of wisdom and communicates with the mind through intuition. The main objective of meditation is to show the boundaries between the Self and the Mind and that they are different areas of oneselves.
( The Self is like the CPU, that actually runs the code, but is not affected by it)

What is Buddhism?

Buddhism was originated on India around 500BC, and was the work of a Hindu prince called Siddartha Gautama, also know as the historic Buddha (which simply means "awakened-one").

His Teachings are based on 4 main points:
The Existence of Suffering:The truth is that we live in a world full of suffering, and what we do during our lives is to run or hide from it.
(Ohh my god, software development is HARD!!!).

The Cause of Suffering:But, there is a cause for our suffering, which is the existence of impermanence. The truth is that everything changes. But although things change, our ego naturally get attached or aversed to several things and/or thoughts. So, this ego with time becomes obsolete or worse, leads us to illusion which is lack of the capacity to see things as they really are.
(This Framework is awesome, the language sucks, I am very good at architecting).

The Cessation of Suffering: So, if the cause of suffering is our attachment/aversion contained on our ego, there is a way to end the suffering, which is to understand such causes and accept the ever-changing nature of life, without creatting attachments.
(Perhaps, I am not that good at all, this framework has its strengths and weaknesses).

The Method to reach the Cessation of Suffering:The method to eliminate the ego and its attachments is called the 8-fold way, and mainly contains methods like: feel the things as they are, see the things as they are, do the things as they are and so on.
("I know that I know nothing").

From the points above, we can see that for Buddhism mainly the root cause for human suffering is self-delusion, and this is created by the fear of suffering which raises the ego.

Zen and its Philosophy

It is really hard to know when Zen started or who was the real founder, because the method of Zen is actually the same as the one used by Siddharta Gautama to reach Enlightment (Vipassana Meditation).

What we know as Zen was originated as Chan on China by a buddhist monk called Boddhidharma, that founded Zen on the ShaoLin Monastery. In Japan, a Monk called Dogen travelled to China and brought its teachings to Japan, where he founded the Soto-School. So, Zen is actually Chinese, and has developed most of its core teachings and practices during the Tang Dinasty (600-900AD) in China, although today the Zen that we practice comes from Japan.

Differently from other traditions, Zen focuses heavily on meditation, and all other practices are secondary. So, it is important to notice that unless you practice in a regular basis, it is impossible to understand Zen's teachings and writtings like Koans or the Shobogenzo, so zen is the art of meditation and its effects

Shossin (Beginner's mind)

After some time of meditation practice, where you can observe and understand the inner workings of the ego and the mind, you start to develop what we call as Shossin, translated as Beginner's mind.

This is a feeling, a posture towards knowledge, where you understand that although you may have a lot of experience with a certain task, all of these experience belongs to a past that had different conditions and circunstances. Yes, the past is there for reference, but we see the task today as a new task, different from the ones that we did on the past, as if we were doing the task the first time, as a begginner.

As Suzuky Roshi said, when you have the beginner's mind, there is a lot of possibilities that simply doesn't show when you start a task with the Have-Done-Thousand-Times-Before-Mindset.

(Ok, I have created hundred several web pages, but I feel like it is the first time that I am doing it, so let us explore what we can do best this time).

Zen is Single-Player

Another Effect after sometime of meditation practice is the perception that Zen is single-player, which means that all as we understand as concepts, ideas, and thoughts are entirely personal and doesn't apply to other persons.

This is in contrast with our society's perception that personal experiences from other persons can be emulated or copied, which is simply not true, the teachings, the experiences from other persons, the sutras, they are all: "Fingers pointing to the Moon", they point to an understanding, but are only a roadsign, not the experience or knowledge itself.

Past and Future

And finally, we understand that, what we understand as Past and Future is actually contained on the Present, and that there is neither Past or Future from a Mind point-of-view.

The Past is simply our interpretation of our memories and emotions that were imprinted on ourselves from past events. But those imprints and interpretations also change with time, system of beliefs, attachments and aversions, that is why we selectly choose to forget some idea ou fact from the past. So the Past is dependent on our present view of life, our present ego, so it is contained in the present, and each present has its own interpretation of each past event.

The Future is our expectation of things to come, according to our ego beliefs. It is also ever-changing because our ego is always changing as well. It is said that many people live in the future, but this is not my understanding, they live in their ego, which has became so large that simply doesn't leave space for the perceptions of the present moment.

Master Foo and the Programming Prodigy

There is a small tale from Master Foo regarding programming and the idea that our ego changes all the time and with it the perception of past and future:

There was a time when rumors began to reach Master Foo and his students of a prodigiously gifted programmer, a young man who wandered the length and breadth of the land performing mighty feats of coding and humiliating all who dared set their skill against his.

Eventually this prodigy came to visit Master Foo, who received him politely and offered him tea. The Prodigy accepted with equal politeness and explained the motive for his visit.

“I have come to you,” he said “seeking a code and design review of my latest project. For it is of surpassing complexity, and I do not have peers capable of understanding it. Only an acknowledged master such as yourself (and here the Prodigy bowed deeply) can have the discernment required.”

Master Foo bowed politely in return and began examining the Prodigy's code. After some time he raised his eyes from the screen. “This code is at first sight very impressive,” he said. “It is elegant in design, utilizing original algorithms of great ingenuity, and appears to be implemented in a craftsmanlike way which minimizes the possibility of errors.”

The Prodigy looked very pleased at this praise, but Master Foo continued: “However, I detect one significant flaw.”

“Flaw?” the Prodigy said. “What flaw?”

“This code is difficult to read,” said Master Foo. “It is only thinly commented, its invariants are not specified, and I see no narrative description of its architecture or internal data structures anywhere. These problems will seriously impede your cooperation with other programmers.”

The Prodigy drew himself up haughtily. “I do not seek the cooperation of other programmers,” he said. “Every time I thought I had found one who might match me in skill I have been disappointed. Thus, I work alone.”

“But even the hacker who works alone,” said Master Foo, “collaborates with others, and must constantly communicate clearly to them, lest his work become confused and lost.”

“Of what others do you speak?” the Prodigy demanded.

Master Foo said: “All your future selves.”

Upon hearing this, the Prodigy was enlightened.

ZaZen

(.\mind --debug-break )
It is very important to notice that Zen is not something to be studied, but it has to be practiced, as such concepts above will only be made clear when we observe the mind in meditation. Most texts like Koans and the Shobogenzo will only be understandable from the point of view of someone sitting in meditation (ZaZen)

It is hard to learn to meditate correctly the first time, the best way to start is to attend a retreat or some special evening program at a local buddhist center, you can start seeing videos on youtube as well, but I guarantee that it is not the same as having a group experience and/or someone giving instructions.

The start is the hardest part, because we have a monkey-mind that always seek attention, so to stop the habit of giving attention to the mind and our senses, is going against decades of education from our current education system, but nothing that in a few months can't be achieved.

(As the mind calms down, it opens the debug port, so we can start debugging ourselves).

DATA SCIENCE

For the last 5 years, I have been very interested in learning the "Data Science", as it becomes the most visible hype of the late 2010's.

But to understand such phenomena it is necessary to understand where we stand on the software industry and its impact on businesses and the overall industry.

What is Happening in the Software Industry?

It is important to understand that prior to the 1990's most of the business processes of most corporations were still in paper media, with only the most critical processes being executed with the help of any computing device. Indicators and metrics on such processes were available, but normally only a very limited set of them, so sometimes business decisions were much more reliant on "gut feeling" than hard data.

In this scenario, where the information is available on hardcopies, it is very hard to report / track and to analyse business information at least when the information involves not the main indicators and information.

All of this changed with the emmergence of ERPs, CRMs, and corporation-wide business information systems that tracked everything from the number of clicks of a web page to the exact route of every single employee inside the business`s buiding. So, on the late 2010's all of this data is available.

But although we have the data, we can`t extract information, neither knowledge from it. So, this niche is where comes data science, we have a lot of data, but we don`t know what to do with it.

Do we need a Data Science?

So, we have this strange situation, where we have huge datasets, but we cant`t extract knowledge or make correct business decisions from it as most of the data is still unstructured. As on the 1990s we have the reports from the CRMs and ERPs, which provide the main business indicators, but it is hard to analyse data that comes from unstructured formats as websites logs, data from outside the corporation internal systems as vendor-data, sales leads, and many others.

And after we manage to make sense of the data, we need to find patterns, and learn such patterns, put the machine to learn the patterns on the data, to find correlations, and from these correlations gain insight to make business decisions.

So, "Data Science" is the marriage of software engineering to structure and arrange chaotic data from multiple data sources and statistical and algebrical algorithms to find patterns on such data.

It is important to notice that I never use the word: AI (Artificial Inteligence), because we are still many decades far from it, what people consider AI is still simply algebrical methods to create inference and find patterns that the human brain does in milliseconds. On the current state of the art, AI is a very misleading term, I prefer numerical computation as actually what we do in "AI" nowadays is something like:
a*2 + b^2 + c^3 = 40 (Find a,b,c)
But of course, with millions of variables.

How do we work with data?

80% of the work of the data scientist is actually organizing the unstructured data from many datasources and place it in a tabular form, or as a image, or sound file, something that can be processed by a software program. 5% is actually creating the machine learning program which will try to analyse the data and search for patterns, and 15% of it is analysing and hacking the model for better results.

My Personal Experience with Data Science

Currently, I work with CallCenter CRM's, so the main goal is to locate the correct customer and perform a productive contact. It is normal for us for each customer to have thousands of fields and files of information for each one, so if we can manage to find patterns to maximize performance, this can have a huge impact.

I have already worked on several projects @ Zanc in order to try to maximize the callcenter efficiency. Such projects are:

Deep Mailing: A Project to find the variables that better identify the chance that a customer will answer the call from the call center, we used more than 200 variables to look for the more relevant and managed to improve in 20% the response rate for our calls.
https://github.com/gbencke/DeepMailing

Herval Deep Mailing: This is a project made for the Herval Company (http://www.herval.com.br/), in order to identify the most relevant characteristics for customers with good credit scoring with the company.
https://github.com/gbencke/HervalDeepMailing

For both projects we use python-stack with jupyter notebooks and XGBoost library to create Gradient-Boosting Decision Trees.

Kaggle Competitions and Culture

Data Science is based on 2 sciences: Computer Science and Statistics/Numerical Math, but as it is a form of arranging and extracting insight from chaotic data, it is also a very challenging form of art.

For this, there are many websites with competitions where companies provide datasets and business problems for teams around the world to solve.

In my humble opinion, this is by far the best way to learn, as we are challenged to code and test very challenging projects with real datasets.

The last competitions that I participated are:

Santander Competition: This is a competition very similar to the Herval Deep Mailing project that I did for Zanc, but with more variables (4492), and using linear regression to calculate how much a customer is likely to invest in a financial product of the bank.
https://github.com/gbencke/Kaggle.2018.Santander.Competition

Data Science Bowl: This is my first project with image-processing, as this challenge requires to correctly identify from the cells contained on thousands of microscopic images provided by several pharmaceutical companies. This challenge uses deep learning as uses image processing.
https://github.com/gbencke/Kaggle.DataScienceBowl.2018

HACKATHONS

(and Competitive Software Development)

IGEOC Hackathon
Deep Mailing Project (1st Place)

During the 5th to the 7th of May of 2018, the IGEOC Institute Sponsored a Hackathon to allow the public to propose new tools and methodologies for the Debt-Collection Business in Brazil. Instituto IGEOC is the Association of Debt-Collection Companies in brazil and is responsible to represent and help regulate this business in conjunction with the Brazilian Government



During 36 continuous hours, 12 groups worked to propose the best solution, which included: ChatBots, Natural Language Processing, Self-Service Apps for automatic negotiation of Debts and many others.



Our solution was based on the integration of Debt-Collection processes using a BPMN like tool that integrates the reception of debt data from the financial institutions, its validation, priorization and then processing through the several contact tools that we use today to reach out for the debtor.

First Place

The Judges and Audience understood that today there is a lacking on integration tools for this business and chose our tool as the best of the 12 presented.


Deep Mailing

With the motto that the main concern in reaching out the customer is not the contact medium (*like Email, SMS, ChatBot, WebPage*) and so on, but the lack of planning tools for the correct planning of such contact, we proposed a tool to integrate the several platforms that we currently use to execute such contact. BPMNis the best standard for heterogenous applications to exchange data and execute complex processes.

Architecture

The Web site uses a simple bootstrap interface for the CRUD of the strategy and then the user can design the process using BPMNJS. Such library was customized in order to allow for the creation of Debt-Collection specific components.

Source Code

The Source Code for the project is available on my Github: HackathonIGEOC